Thomas Fuchs
Hi, I'm Thomas Fuchs. I'm the author of Zepto.js, of script.aculo.us, and I'm a Ruby on Rails core alumnus. With Amy Hoy I'm building cheerful software, like Noko Time Tracking and Every Time Zone and write books like Retinafy.me.
   Want me to speak at your conference? Contact me!

Standards bloat and HTML5

October 19th, 2010

A long time ago, HTML started out as a small prototype language for hypertext. It was based on some previous efforts and it seems like a pragmatic approach to get a real-world hypertext system going as quickly as possible.

So where did things go from there?

This chart shows the number of words in each of HTML’s versions (the first overview “HTML Tags” version is not included, it had roughly 1,500 words).

So is it bloated?

By all means, yes.

  • HTML 1.0 — 9,967 words
  • HTML 2.0 — 18,880 words
  • HTML 3.2 — 15,570 words
  • HTML 4.01 — 104,567 words
  • HTML5 Draft — 324,969 words

However, it’s good that things are described in much more detail, which, in theory makes it more likely that browsers will act more alike in the future; as long as a feature is implemented, it should be supported as it says in the spec. These documents shifted from being a resource for users of the language to a specification for implementors (browser vendors).

Given that HTML5 is rapidly approaching 400k words it might be a little bit too detailed (for comparison, that’s more than twice the number of words in the New Testament but not quite yet the 560k words of the Lord of the Rings trilogy).

The trouble is, developers and users of HTML can’t read all of this. This is comparable to the ECMAScript specifications, where the spec is unreadable for laymen, but most other resources are too simplistic, partially wrong and always out-of-date (we need a Promote JavaScript for HTML!). HTML is has too many features, too many special cases, too many little quirks, too much of everything.

I wonder if HTML5 will be the web’s finest hour or the beginning of something new—for example, people shifting to using CANVAS or WebGL or some-other-tech to render content, instead of the HTML tag jungle.

It’s here to stay

Whatever the future might hold, I think HTML is here to stay, at least for quite a while. Let’s face it: it’s very flexible and kept up pretty well. We even survived the the doomed XML detour of the early 2000’s (lucky us!).

The HTML5 standard, besides all the bloat, is going in the right direction. For users, it will be more pragmatic, as it doesn’t enforce strict XML syntax, and it makes sure previously unclear concepts are now well-defined (if you want to know what exactly should happen, look into the spec, it’s actually well written).

But of course, the specification will always be late and not reflect what’s actually out there. Not that it matters that much, as we all know HTML5 will be finalized in 2022, three years after the events depicted in Blade Runner. D’oh!