Tuesday, January 17, 2012

The developer's guide to the HTML5 APIs

The developer's guide to the HTML5 APIs
Whilst we see, read and hear a lot about the new semantic elements in HTML5 we arguably hear far less about the application programming interfaces (APIs) that make up a large part of the specification itself.
As I'm sure you're aware that there are two versions of the HTML5 specification, one published by the W3C and another by the WHATWG. The living HTML specification maintained by the WHATWG contains additional APIs to those in the W3C HTML5 spec (although generally they are also maintained by the W3C but in separate specifications).
Alongside those in the specification are a number of related APIs that form part of the standards stack and are often grouped under the "HTML5" umbrella term. In some cases the APIs have been around and implemented for a while, but they've never been documented; something which HTML5 has set out to change.
In this article we're not going to look at code but instead we'll focus on describing the APIs, their purpose and progress. We'll then point you in the right direction to find out more.

APIs in the HTML5 specification

We'll start by looking at the APIs in the W3C HTML5 spec.

Media API

The media API is part of the media element which includes two of HTML5's poster children, the video and audio elements. The elements themselves are simple to implement but what's less well known are the JavaScript methods available within the associated API. There are a number of methods including play() and pause() as well as load() and canPlayType(). Many of the methods are shared between both media types with a subset of additional properties (eg poster) specifically related to the video element. Combined with additional events and attributes the API allows us to, amongst other things, create custom controls.
To find out more, take a look at the following articles.

Text Track API

The text track API leads on nicely from the media API. It is designed to allow us to interact with text tracks (subtitles or captions for example) for the audio and video elements.
You can return the number of text tracks and their length associated with a media element, the kind of text track (subtitles, captions, descriptions, chapters and metadata), language, readyState, mode and label.
This API will have far more support when browsers begin to implement native subtitling, using WebVTT for example. In the meantime, to get up to speed, look at these resources:

Drag and Drop

The drag and drop API has been the topic of much debate. Originally created by Microsoft in version 5 of Internet Explorer, it is now supported by Firefox, Safari and Chrome. So what does it do?
Well, as the name suggests, it brings native drag and drop support to the browser. By adding a draggable attribute set to true, the user has the ability to move any element. You then add some event handlers on a target drop zone to tell the browser where the element can be dropped.
The API's real muscles are flexed when you start to think outside of the browser. Using drag and drop, a user could drag an image from the desktop into the browser or you could create an icon that gets loaded with content when dragged out of the browser by the user to a new application target.
Drag and Drop is covered in depth in the below articles.

Offline Web Applications/Application Cache

With the blurring of native apps (mobile and desktop) and web apps comes the inevitable task of wanting to take our applications offline. The Offline Web Applications specification details how to do just that using application caching.
Application caching is carried out by creating a simple manifest file which lists the files that are required for the application to work offline. Authors can then ensure their sites function offline. The manifest causes the user’s browser to keep a copy of the files for use offline later. When a user views the document/application without network access, the browser switches to use the local copies instead. So in theory, you should be able to finish writing that important email or playing the web version of Angry Birds while you're on the underground/subway.
With relatively strong browser support, particularly in the mobile arena (Firefox, Safari, Chrome, Opera, iPhone, and Android), it's something you can start using right now. For further reading, I suggest:

User Interaction

Like offline, user interaction is part of the primary HTML5 specification. It's worth mentioning here because some of its features, such as the contenteditable attribute, are extremely useful when you're creating web applications. contenteditable has been around in internet Explorer since version 5.5 and works in all five major browsers. Setting the attribute to true indicates that the element is editable. Authors could then, for example, combine this with local storage to track changes to documents.
For more information, take a look at the current spec but note that some sections have been moved to the HTML Editing APIs work in progress.


A browser's back button is the most heavily used piece of its chrome. Ajax-y web applications break it at their peril. Using HTML5's History API, developers have a lot more control over the history state of a user's browser session.
The pre-HTML5 History API allowed us to send users forward or back, and check the length of the history. What HTML5 brings to the party are ways to add and remove entries in the user's history, hold data to restore a page state and update the URL without refreshing the page. The scripting is fairly straightforward and will help us build complex applications that don't refresh the page from which we can continue to share URLs as we've always done.
For more detail on the History API:

MIME type and protocol handler registration

This API allows sites to register themselves as handlers for certain schemes. By using the registerProtocolHandler method, an example use case could be:
an online telephone messaging service could register itself as a handler of the sms: scheme, so that if the user clicks on such a link, he is given the opportunity to use that Web site (W3C HTML Spec)
Certain schemes are whitelisted such as sms, tel and irc. In addition there is a registerContentHandler method that allows sites to register as handlers for content with a certain mime type.
The spec is the best place to get started when learning about MIME type and protocol handler registration.

APIs in the WHATWG specification

So far we've looked at specs that exist in both the W3C and WHATWG versions of HTML5. We'll now very briefly introduce a few more APIs that are documented within WHATWG's living standard HTML spec but have been broken out into smaller, more manageable specifications by the W3C. The purpose and the majority of the content is the same in both versions.
  • Canvas 2D Context — allows us draw natively in the browser. Using canvas without the 2D Context API we wouldn't be able to draw. It's our brushes, palette and paint all rolled into one. The API is extensive and pretty much all canvas articles introduce some of the different methods and events of which there are too many to detail here. WHATWG Canvas Element, 2D Context and W3C HTML Canvas 2D Context Spec
  • Cross document and channel messaging — cross document messaging defines a way for documents to communicate with one-another regardless of their source domain without enabling cross-site attacks. In a similar vein, channel messaging uses independent pieces of code to communicate directly. WHATWG HTML, Cross document messaging, WHATWG HTML Cross channel messaging and W3C HTML5 Web Messaging spec
  • Microdata — adds an additional layer of semantics to your documents from which search engines, browsers and more can use to extract information and provide an enhanced browsing experience. WHATWG HTML, Microdata and W3C Microdata spec
  • Web Workers — an API for running JavaScript in the background independent of any user scripts. Allows for long running tasks to be completed without preventing the page from becoming unresponsive. WHATWG HTML, Web Workers and W3C Web Workers Spec
  • Web Storage — a spec for storing client side data (key value pairs) similar to cookies. WHATWG HTML, Web Storage and W3C Web Storage spec
  • Web Sockets — allows pages to use the WebSocket protocol to send two way messages between a browser and server. WHATWG Web Sockets and W3C Web Socket API
  • Server sent events — allows for push notifications to be sent from a server to a browser in the form of DOM events. WHATWG HTML, Server-sent events and W3C Server-Sent Events

The "HTML5" buzzword APIs

If I were to list out all the other APIs that are closely related to HTML5, I'd be here for a while. Another time perhaps. A few of those often incorrectly described as HTML5 are Geolocation, Indexed DB, Selectors, and the Filesystem API.

Mike Smith from the W3C has compiled a comprehensive list of all aspects of the web platform and browser technologies which is well worth bookmarking.
 If you find that something isn't yet supported by browsers, don't despair. It's likely that there's a polyfill to help you mimic the native behaviour.

We've merely scratched the surface of each of these detailed, useful, powerful APIs. In order to find out more and get under the skin of each, go and throw yourself knee deep in code. You'll be surprised at what you'll find while researching and experimenting. As for those APIs that aren't quite fully baked yet, hopefully the article has whetted your appetite for what will be coming to a browser near you soon.

No comments:

Post a Comment