Cache correctly: stop invalidating

December 4, 2011

rails caching

There are only two hard problems in Computer Science: cache invalidation and naming things. - Phil Karlton

One of my favorite things about Rails is that when I read the docs, sometimes I find little gems that just jump out and make me cheer with happiness.

ActiveModel's cache_key method is one of these things.

Let me explain

On my blog, I write the body in markdown. I have a method on the model (not a helper, like it probably should be) called body_html that takes the blog's body and converts the markdown into html. In a given view, I call this a couple times, once for the body, but also for the description meta tags.

It would be nice to only have to perform this computationally intensive operation once, then cache it for later use.

In Django, which is where I was before Rails, I've had to implement caching on some serious systems. Django has much less syntax sugar around caching. If I had to implement this in Django, I would've done the following:

def body_html(self):
    body_html = cache.get('article:' + self.id + ':body_html')
    if not body_html:
        body_html = markdown(self.body)
        cache.set('article:' + self.id + ':body_html', body_html)
    return body_html

What this does is attempts to get the cache for the key 'article:[id]:body_html'. If it can't find it, it converts the body into markdown, then saves it into memcache for later use.

This works great, until you change the text.

Now Django will be pulling in the stale html out of memcache instead of updating itself.

So what do you do?

You could set an expiration time of say, 10 minutes, but this is somewhat inefficient since it'll update when it doesn't need to, and you can't see your changes right away.

You could use a signal on the model to reset the cache at that html. (But as anyone that's used Django intensively can tell you, signals are flaky at best). You'd also need to be super careful you're never bypassing the signals or anything.

Like the quote at the top, cache invalidation is very hard. Not hard to do, you just call cache.delete. Hard because it's not obvious WHERE to invalidate the cache.

So what genius does Rails provide here? The cache_key. The cache_key, unlike my lame cache key I would've used with Django 'article:[id]' includes the timestamp of the last updated time of the model, so something like: 'article/5-20071224150000' instead.

This makes it so if the model is updated, the cache_key will never get hit again!

You would use it, like so:

def body_html
  Rails.cache.fetch "#{cache_key}/body_html" do
    markdown(body)
  end
end

Note: you have to have an updated_at column on the db table for this to work, but rails will add one to all table migrations by default.

My first reaction was, 'well, how would I invalidate it if I can never reference it again?' but you don't need to.

Memcache is an LRU cache (probably don't want to use this pattern on a cache backend that isn't). So if the cache fills up, it'll just dump things that haven't been accessed in a long time.

Obviously this isn't limited to Rails, but I never thought to use the timestamp before. Maybe this is something everyone knows (it seems obvious). But I really wish I would've learned this trick earlier.

I would argue the best form of cache invalidation is this. Don't invalidate if you can help it, just alter the cache key.

Won't work for all cases, but it's a nice pattern for most.

- xxx


Related Posts