You are here: Home Dive Into HTML5
What is an offline web application? At first glance, it sounds like a contradiction in terms. Web pages are things you download and render. Downloading implies a network connection. How can you download when you’re offline? Of course, you can’t. But you can download when you’re online. And that’s how HTML5 offline applications work.
From there, most of the work is up to you, the web developer. There’s a flag in the DOM that will tell you whether you’re online or offline. There are events that fire when your offline status changes (one minute you’re offline and the next minute you’re online, or vice-versa). But that’s pretty much it. If your application creates data or saves state, it’s up to you to store that data locally while you’re offline and synchronize it with the remote server once you’re back online. In other words, HTML5 can take your web application offline. What you do once you’re there is up to you.
An offline web application revolves around a cache manifest file. What’s a manifest file? It’s a list of all of the resources that your web application might need to access while it’s disconnected from the network. In order to bootstrap the process of downloading and caching these resources, you need to point to the manifest file, using a
manifest attribute on your
Your cache manifest file can be located anywhere on your web server, but it must be served with the content type
text/cache-manifest. If you are running an Apache-based web server, you can probably just put an
AddType directive in the
.htaccess file at the root of your web directory:
AddType text/cache-manifest .manifest
Then make sure that the name of your cache manifest file ends with
.manifest. If you use a different web server or a different configuration of Apache, consult your server’s documentation on controlling the
Q: My web application spans more than one page. Do I need a
manifestattribute in each page, or can I just put it in the home page?
A: Every page of your web application needs a
manifestattribute that points to the cache manifest for the entire application.
OK, so every one of your HTML pages points to your cache manifest file, and your cache manifest file is being served with the proper
Content-Type header. But what goes in the manifest file? This is where things get interesting.
The first line of every cache manifest file is this:
After that, all manifest files are divided into three parts: the “explicit” section, the “fallback” section, and the “online whitelist” section. Each section has a header, on its own line. If the manifest file doesn’t have any section headers, all the listed resources are implicitly in the “explicit” section. Try not to dwell on the terminology, lest your head explode.
This cache manifest file has no section headers, so all the listed resources are in the “explicit” section by default. Resources in the “explicit” section will get downloaded and cached locally, and will be used in place of their online counterparts whenever you are disconnected from the network. Thus, upon loading this cache manifest file, your browser would download
clock-face.jpg from the root directory of your web server. Then you could unplug your network cable and refresh the page, and all of those resources would be available offline.
Q: Do I need to list my HTML pages in my cache manifest?
A: Yes and no. If your entire web application is contained in a single page, just make sure that page points to the cache manifest using the
manifestattribute. When you navigate to an HTML page with a
manifestattribute, the page itself is assumed to be part of the web application, so you don’t need to list it in the manifest file itself. However, if your web application spans multiple pages, you should list all of the HTML pages in the manifest file, otherwise the browser would not know that there are other HTML pages that need to be downloaded and cached.
Here is a slightly more complicated example. Suppose you want your clock application to track visitors, using a tracking.cgi script that is loaded dynamically from an
<img src> attribute. Caching this resource would defeat the purpose of tracking, so this resource should never be cached and never be available offline. Here is how you do that:
This cache manifest file includes section headers. The line marked
NETWORK: is the beginning of the “online whitelist” section. Resources in this section are never cached and are not available offline. (Attempting to load them while offline will result in an error.) The line marked
CACHE: is the beginning of the “explicit” section. The rest of the cache manifest file is the same as the previous example. Each of the three resources listed will be cached and available offline.
There is one more type of section in a cache manifest file: a fallback section. In a fallback section, you can define substitutions for online resources that, for whatever reason, can’t be cached or weren’t cached successfully. The HTML5 specification offers this clever example of using a fallback section:
What does this do? First, consider a site that contains millions of pages, like Wikipedia. You couldn’t possibly download the entire site, nor would you want to. But suppose you could make part of it available offline. But how would you decide which pages to cache? How about this: every page you ever look at on a hypothetical offline-enabled Wikipedia would be downloaded and cached. That would include every encyclopedia entry that you ever visited, every talk page (where you can have makeshift discussions about a particular encyclopedia entry), and every edit page (which you can actually make changes to the particular entry).
That’s what this cache manifest does. Suppose every HTML page (entry, talk page, edit page, history page) on Wikipedia pointed to this cache manifest file. When you visit any page that points to a cache manifest, your browser says “hey, this page is part of an offline web application, is it one I know about?” If your browser hasn’t ever downloaded this particular cache manifest file, it will set up a new offline “appcache” (short for “application cache”), download all the resources listed in the cache manifest, and then add the current page to the appcache. If your browser does know about this cache manifest, it will simply add the current page to the existing appcache. Either way, the page you just visited ends up in the appcache. This is important. It means that you can have an offline web application that “lazily” adds pages as you visit them. You don’t need to list every single one of your HTML pages in your cache manifest.
Now look at the fallback section. The fallback section in this cache manifest only has a single line. The first part of the line (before the space) is not a URL. It’s really a URL pattern. The single character (
/) will match any page on your site, not just the home page. When you try to visit a page while you’re offline, your browser will look for it in the appcache. If your browser finds the page in the appcache (because you visited it while online, and the page was implicitly added to the appcache at that time), then your browser will display the cached copy of the page. If your browser doesn’t find the page in the appcache, instead of displaying an error message, it will display the page
/offline.html, as specified in the second half of that line in the fallback section.
Finally, let’s examine the network section. The network section in this cache manifest also has just a single line, a line that contains just a single character (
*). This character has special meaning in a network section. It’s called the “online whitelist wildcard flag.” That’s a fancy way of saying that anything that isn’t in the appcache can still be downloaded from the original web address, as long as you have an internet connection. This is important for an “open-ended” offline web application. It means that, while you’re browsing this hypothetical offline-enabled Wikipedia online, your browser will fetch images and videos and other embedded resources normally, even if they are on a different domain. (This is common in large websites, even if they aren’t part of an offline web application. HTML pages are generated and served locally, while images and videos are served from a CDN on another domain.) Without this wildcard flag, our hypothetical offline-enabled Wikipedia would behave strangely when you were online — specifically, it wouldn’t load any externally-hosted images or videos!
CACHE: section of the manifest file, in order for pages to display and behave properly offline. But the point of the fallback section is that you can have an “open-ended” offline web application that extends beyond the resources you’ve listed explicitly in the manifest file.
So far, I’ve talked about offline web applications, the cache manifest, and the offline application cache (“appcache”) in vague, semi-magical terms. Things are downloaded, browsers make decisions, and everything Just Works. You know better than that, right? I mean, this is web development we’re talking about. Nothing ever Just Works.
First, let’s talk about the flow of events. Specifically, DOM events. When your browser visits a page that points to a cache manifest, it fires off a series of events on the
window.applicationCache object. I know this looks complicated, but trust me, this is the simplest version I could come up with that didn’t leave out important information.
manifest attribute on the
<html> element, your browser fires a
checking event. (All the events listed here are fired on the
window.applicationCache object.) The
checking event is always fired, regardless of whether you have previously visited this page or any other page that points to the same cache manifest.
downloading event, then start to download the resources listed in the cache manifest.
progress events, which contain information on how many files have been downloaded already and how many files are still queued to be downloaded.
cached. This is your signal that the offline web application is fully cached and ready to be used offline. That’s it; you’re done.
noupdate event. That’s it; you’re done.
downloading event and start re-downloading every single resource listed in the cache manifest.
progress events, which contain information on how many files have been downloaded already and how many files are still queued to be downloaded.
updateready. This is your signal that the new version of your offline web application is fully cached and ready to be used offline. The new version is not yet in use. To “hot-swap” to the new version without forcing the user to reload the page, you can manually call the
If, at any point in this process, something goes horribly wrong, your browser will fire an
error event and stop. Here is a hopelessly abbreviated list of things that could go wrong:
I want to call out two important points here. The first is something you just read, but I bet it didn’t really sink in, so here it is again: if even a single resource listed in your cache manifest file fails to download properly, the entire process of caching your offline web application will fail. Your browser will fire the
error event, but there is no indication of what the actual problem was. This can make debugging offline web applications even more frustrating than usual.
The second important point is something that is not, technically speaking, an error, but it will look like a serious browser bug until you realize what’s going on. It has to do with exactly how your browser checks whether a cache manifest file has changed. This is a three-phase process. This is boring but important, so pay attention.
Cache-Control) tell your browser how it is allowed to cache the file without ever asking the server whether it has changed. This kind of caching has nothing to do with offline web applications. It happens for pretty much every HTML page, stylesheet, script, image, or other resource on the web.
304 (Not Modified) status. Again, none of this is specific to offline web applications. This happens for essentially every kind of resource on the web.
200 (OK) status code, followed by the contents of the new file, along with new
Cache-Control headers and a new last-modified date, so that steps 1 and 2 will work properly the next time. (HTTP is cool; web servers are always planning for the future. If your web server absolutely must send you a file, it does everything it can to ensure that it doesn’t need to send it twice for no reason.) Once it’s downloaded the new cache manifest file, your browser will check the contents against the copy it downloaded last time. If the contents of the cache manifest file are the same as they were last time, your browser won’t re-download any of the resources listed in the manifest.
Any one of these steps can trip you up while you’re developing and testing your offline web application. For example, say you deploy one version of your cache manifest file, then 10 minutes later, you realize you need to add another resource to it. No problem, right? Just add another line and redeploy. Bzzt. Here’s what will happen: you reload the page, your browser notices the
manifest attribute, it fires the
checking event, and then... nothing. Your browser stubbornly insists that the cache manifest file has not changed. Why? Because, by default, your web server is probably configured to tell browsers to cache static files for a few hours (via HTTP semantics, using
Cache-Control headers). That means your browser will never get past step 1 of that three-phase process. Sure, the web server knows that the file has changed, but your browser never even gets around to asking the web server. Why? Because the last time your browser downloaded the cache manifest, the web server told it to cache the resource for a few hours (via HTTP semantics, using
Cache-Control headers). And now, 10 minutes later, that’s exactly what your browser is doing.
To be clear, this is not a bug, it’s a feature. Everything is working exactly the way it’s supposed to. If web servers didn’t have a way to tell browsers (and intermediate proxies) to cache things, the web would collapse overnight. But that’s no comfort to you after you spend a few hours trying to figure out why your browser won’t notice your updated cache manifest. (And even better, if you wait long enough, it will mysteriously starts working again! Because the HTTP cache expired! Just like it’s supposed to! Kill me! Kill me now!)
So here’s one thing you should absolutely do: reconfigure your web server so that your cache manifest file is not cacheable by HTTP semantics. If you’re running an Apache-based web server, these two lines in your
.htaccess file will do the trick:
That will actually disable caching for every file in that directory and all subdirectories. That’s probably not what you want in production, so you should either qualify this with a
<Files> directive so it only affects your cache manifest file, or create a subdirectory that contains nothing but this
.htaccess file and your cache manifest file. As usual, configuration details vary by web server, so consult your server’s documentation for how to control HTTP caching headers.
Once you’ve disabled HTTP caching on the cache manifest file itself, you’ll still have times where you’ve changed one of the resources in the appcache, but it’s still at the same URL on your web server. Here, step 2 of the three-phase process will screw you. If your cache manifest file hasn’t changed, the browser will never notice that one of the previously cached resources has changed. Consider the following example:
# rev 42
If you change
clock.css and redeploy it, you won’t see the changes, because the cache manifest file itself hasn’t changed. Every time you make a change to one of the resources in your offline web application, you’ll need to change the cache manifest file itself. This can be as simple as changing a single character. The easiest way I’ve found to accomplish this is to include a comment line with a revision number. Change the revision number in the comment, then the web server will return the newly changed cache manifest file, your browser will notice that the contents of the file have changed, and it will kick off the process to re-download all the resources listed in the manifest.
# rev 43
Remember the Halma game that we introduced in the canvas chapter and later improved by saving state with persistent local storage? Let’s take our Halma game offline.
<style> element at the top of the HTML page. So this is our cache manifest:
A word about paths. I’ve created an
offline/ subdirectory in the
examples/ directory, and this cache manifest file lives inside the subdirectory. Because the HTML page will need one minor addition to work offline (more on that in a minute), I’ve created a separate copy of the HTML file, which also lives in the
.js file, which lives in the parent directory (
examples/). Altogether, the files look like this:
/examples/localstorage-halma.html /examples/halma-localstorage.js /examples/offline/halma.manifest /examples/offline/halma.html
In the cache manifest file (
/examples/offline/halma.manifest), we want to reference two files. First, the offline version of the HTML file (
/examples/halma-localstorage.js). This is listed in the manifest file using relative URL notation:
../halma-localstorage.js. This is just like you might use a relative URL in an
<img src> attribute. As you’ll see in the next example, you can also use absolute paths (that start at the root of the current domain) or even absolute URLs (that point to resources on other domains).
Now, in the HTML file, we need to add the
manifest attribute that points to the cache manifest file.
<html lang="en" manifest="halma.manifest">
And that’s it! When an offline-capable browser first loads the offline-enabled HTML page, it will download the linked cache manifest file and start downloading all the referenced resources and storing them in the offline application cache. From then on, the offline application algorithm takes over whenever you revisit the page. You can play the game offline, and since it remembers its state locally, you can leave and come back as often as you like.
Browser vendor documentation:
Tutorials and demos:
HTML5 Offline Application Cache Tools:
This has been “Let’s Take This Offline.” The full table of contents has more if you’d like to keep reading.
In association with Google Press, O’Reilly is distributing this book in a variety of formats, including paper, ePub, Mobi, and DRM-free PDF. The paid edition is called “HTML5: Up & Running,” and it is available now. This chapter is included in the paid edition.
If you liked this chapter and want to show your appreciation, you can buy “HTML5: Up & Running” with this affiliate link or buy an electronic edition directly from O’Reilly. You’ll get a book, and I’ll get a buck. I do not currently accept direct donations.
Copyright MMIX–MMXI Mark Pilgrim