Introduction to Microformats

I’m really into microformats these days. Microformats are about assigning semantically meaningful labels to structured data embedded in web pages. This allows this data to be machine readable as well as human readable.

For a simple example, consider the top left corner of my web site. Seen in a browser it reads, “Voices from Catland is the personal site of Kenn Wilson, sometimes known as Kenn Christ, of San Francisco, CA. Any human reader instantly knows who the site belongs to and where to find me geographically (roughly). If this were written in regular HTML, human readers are the only people (so to speak) this would be good for. If this were plain vanilla HTML it would probably look something like this:

<p><a href="http://www.inmostlight.org/">Voices from Catland</a> is the personal site of Kenn Wilson, sometimes known as Kenn Christ, of San Francisco, CA.</p>

Simple and straightforward, but non-human readers are going to have no idea what my site is about or who it belongs to. Enter the hCard microformat. hCard is an implementation of the vCards that most any e-mail or address book software uses to store and share contact information. This is what the above information looks like marked up with the hCard microformat:

<div id="hcard" class="vcard contact"> <p><a href="http://www.inmostlight.org/" class="url">Voices from Catland</a> is the personal site of <span class="fn">Kenn Wilson</span>, sometimes known as <span class="nickname">Kenn Christ</span>, of <span class="adr"></span><span class="locality">San Francisco</span>, <span class="region">CA</span>.</p> </div>

It looks a lot like the first example, but only a little bit more complex. The important things to note here are the class names assigned to the various XHTML elements in this section. Notice how they describe the content they’re assigned to: url for my web site’s URL, fn and nickname for my full name and nickname respectively, adr, locality, and region to describe where I live. With this simple markup, any software that supports hCard can now parse my web site and extract meaningful information about it and me. Even better, this software can automatically download this information into a format that can be directly imported into desktop address books. For a more complex example, take a look at my single-page quasi-professional site and my resume. Both pages have embedded hCards and the resume is marked up with another microformat, hResume. Run any of these URLs through the Technorati hCard to vCard converter to get a real-life example of how this can work.

In addition to hCard, there are also microformats available for calendar data (hCalendar), resumes (hResume), reviews (hReview), and more. Browse the Microformats wiki for a list of current and proposed formats.

It’s hard to say exactly what I find so interesting about this idea. It could just be that this gets us closer to what I believe the web is all about. Of course web pages should be machine readable. Of course there should be ways to extract certain types of real information from web pages without the tedium of picking over them to find it. Of course we should be able to automatically announce the type of content we’re publishing. My part in the microformats community is mainly as a web site creator: I use them on my own site and I use them where appropriate on other sites I work on, both personal and client sites at work. I have also been working on setting up a public hAtom to Atom converter and I’ll be adding support for at least hCard and hCalendar soon (more on this later). I’m also doing what I can to spread the word.

Update

My microformat converters are now up and running.