html5 microdata

Sometimes people can be so focused on theory that they completely miss the fact that real life has already caught up with theory. Last week I discovered that html5 microdata can finally be used in the wild, so I jumped on it for some cosy semantic experimentations. I voiced my concerns on html5 microdata in the past, but I wouldn't miss the change to see the semantic web in action, especially not when Google is on board and ready for some semantic magic.

microdata reservations

The way I see it, there are three main problems with the current microdata spec. First of all the syntax is just too verbose. You need at least three new properties (itemscope, itemtype and itemprop) to get anything useful out of your marked-up elements, two of them requiring custom attribute values (itemtype and itemprop). That's a lot of extra data for something microformats fixed with just a few (extra) classes.

The second thing that worries me is the obvious correlation between microdata and class names. I know they are two different things with different goals and purposes, but there's an undeniable link between the two of them. Different semantic elements require different styling (if not now, then maybe in the future so if you're into future-proof coding this is a fact rather than a possibility). When using microdata you're pretty much doing the same work twice: naming elements with semantics in mind and naming elements with styling in mind. Not very efficient if you ask me.

And finally, getting your hard work implemented will prove to be an additional challenge. It's hard enough to get the CMS to throw up the classes and tags you requested, if you're going to bother back-end developers with a stack of semantic hocus pocus things might end up a true battlefield. While this sounds like the least important issue we're dealing with (as it is not ideological in nature), it's actually the one that has the biggest impact on the success rate of the whole operation.

here and now

So why bother with microdata? Well, because Google bothers. If you implement known microdata vocabulary in your site today Google can and will pick it up (to test it, you can use the Google's Rich Snippets Testing Tool. While this data is currently not used for page ranking purposes, it can be used to enhance your search result snippets (I think the most visible example today is when Google ads review ratings to the search results).

If you want known vocabularies (which is way more interesting than inventing your own and ending up with microdata definitions no machine can read) you can check sites like schema.org which give you a proper overview of the most common vocabularies out there. It takes some time to get used to the site and to find what you're looking for (some extra examples would've come in handy) but once you get a feel for it (and you see some results in the Google testing tool) I assure you things will go smoothly from there on.

basic usage

<article itemscope="itemscope" itemtype="http://schema.org/Person"> <h1 itemprop="name">(person name)</h1> </article>

The example above illustrates the most basic use of microdata. Each base tag of an object is marked with the itemscope attribute (I'm using the xml serialization syntax here), the nature of the object is given through the itemtype property (which is a working url to the page containing the vocabulary syntax). The attributes of the object are defined through the itemprop property which can be set on all nested elements.

The entire microdata syntax is a bit more versatile, allowing you to add not-nested data to an object, uniquely identify objects and to define multiple properties for one single value, but I'll just direct you to the w3 microdata page as they explain it in much more detail than I ever could.

conclusion

Safe its over-verbose syntax and repetitive nature, html5 microdata is cool because it works today. Add it to your pages and watch how Google picks it up, using it to enrich its search results. Hopefully it will at one time influence page ranking (as Google can now properly interpret your data), but I assume that for the time being they're not allowing it in an attempt to counter blackhat seo tactics (in other words, microdata abuse to increase page ranking).

So if you feel any affinity with semantics, now is the time to get really started. Check the microdata syntax, bookmark the vocabularies and you're good to go. Exciting times indeed!