Friday, February 4, 2011

Help with aggressive JavaScript caching

I've run into a problem where I make changes to a few JavaScript files that are referenced in an HTML file, but the browser doesn't see the changes. It holds onto the copy cached in the browser, even though the web server has a newer version.

Not until I force the browser to clear the cache do I see the changes.

Is this a web-server configuration? Do I need to set my JavaScript files to never cache? I've seen some interesting techniques in the Google Web Toolkit where they actually create a new JavaScript file name any time an update is made. I believe this is to prevent proxies and browsers from keeping old versions of the JavaScript files with the same names.

Is there a list of best practices somewhere?

  • I am also of the method of just renaming things. It never fails, and is fairly easy to do.

    From Unkwntech
  • We append a product build number to the end of all Javascript (and CSS etc.) like so:

    <script src="MyScript.js?4.0.8243">
    

    Browsers ignore everything after the question mark but upgrades cause a new URL which means cache-reload.

    This has the additional benefit that you can set HTTP headers that mean "never cache!"

    ceejayoz : Yep, that's the best way of doing it in my book.
  • is your webserver sending the right headers to tell the browser it has a new version? I've also added the date to the querystring before. ie myscripts.js?date=4/14/2008 12:45:03 (only the date would be encoded)

  • @Jason and Darren

    IE6 treats anything with a query string as uncacheable. You should find another way to get the version number into the url, such as a fake directory:

    <script src="/js/version/MyScript.js"/>
    

    and just remove that first directory level after js on the server side before fulfilling the request.

    EDIT: Sorry all; it is Squid, not IE6, that won't cache with a query string. More info here.

  • @Darren The caching problem has occurred on both IIS 6 & Apache 2 out-of-the-box. I'm not sure if the proper resolution is to modify the HTTP response headers, but instead to take the renaming route described in a few of the responses here.

    @Chris Good tip. I thought the query string approach was a good one, but it sounds like a unique file or directory name is necessary to cover all cases.

  • With every release, we simply prepend a monotonically increasing integer to the root path of all our static assets, which forces the client to reload (we've seen the query string method break in IE6 before). For example:

    • Release 1: http://www.foo.com/1/js/foo.js
    • Release 2: http://www.foo.com/2/js/foo.js

    It requires rejiggering links with each release, but we've built functionality to automatically change the links into our deployment tools.

    Once you do this, you can use Expires/Cache-Control headers that let the client cache JS resources "forever", since the path changes with each release, which i think is what @JasonCohen was getting at.

    From argv0
  • It holds onto the copy cached in the browser, even though the web server has a newer version.

    This is probably because the HTTP Expires / Cache-Control headers are set.

    http://developer.yahoo.com/performance/rules.html#expires

    I wrote about this here:

    http://www.codinghorror.com/blog/archives/000932.html

    This isn't bad advice, per se, but it can cause huge problems if you get it wrong. In Microsoft's IIS, for example, the Expires header is always turned off by default, probably for that very reason. By setting an Expires header on HTTP resources, you're telling the client to never check for new versions of that resource -- at least not until the expiration date on the Expires header. When I say never, I mean it -- the browser won't even ask for a new version; it'll just assume its cached version is good to go until the client clears the cache, or the cache reaches the expiration date. Yahoo notes that they change the filename of these resources when they need them refreshed.

    All you're really saving here is the cost of the client pinging the server for a new version and getting a 304 not modified header back in the common case that the resource hasn't changed. That's not much overhead.. unless you're Yahoo. Sure, if you have a set of images or scripts that almost never change, definitely exploit client caching and turn on the Cache-Control header. Caching is critical to browser performance; every web developer should have a deep understanding of how HTTP caching works. But only use it in a surgical, limited way for those specific folders or files that can benefit. For anything else, the risk outweighs the benefit. It's certainly not something you want turned on as a blanket default for your entire website.. unless you like changing filenames every time the content changes.

  • Some very useful techniques in here even if you are not planning to use powershell to automate deployment.

    From Gulzar
  • For what it is worth, I saw deviantART site, quite a big one, serving their JS files as 54504.js. I just checked and see they now serve them as v6core.css?-5855446573 v6core_jc.js?4150339741 etc.

    If the problem of query string comes from the server, I suppose you can control that more or less.

    From PhiLho
  • I've written a blog post about how we overcame this problem here:

    Avoiding JavaScript and CSS Stylesheet Caching Problems in ASP.NET

    Basically, during development you can add a random number to a query string after the filename of your CSS file. When you do a release build, the code switches to using your assembly's revision number instead. This means that in your production environment, your clients can cache the stylesheet, but whenever you release a new version of the site they'll be forced to re-load the file.

0 comments:

Post a Comment