Fun With memcache

Fun With memcache

Fun With memcache

 

Here at Logic Supply we recently started implementing a caching solution for the Web site. I’m relatively new to Web development in general and had no previous experience with memcache, but since it’s a pretty popular solution and sounded like a good fit for our needs, we figured we’d give it a shot. Read on to hear more about my adventures in memcache land…

Here at Logic Supply our Web site runs on PHP. PHP has a very simple and easy to use memcache library, however you do need to re-compile PHP for it to be supported. Luckily this is rather simple, just use the ‘–enable-memcache[=DIR]’ flag when you re-compile and you’re all set. Obviously this requires the ability to re-compile your version of PHP, so if you’re using hosting from a company that doesn’t give you access, you’re probably out of luck (but you may not need it in this case anyway…). Also, keep in mind that we’re running PHP 4, so there may be things I’m not aware of in PHP 5 with relation to memcache.

For those of you not really familiar with memcache, here’s a quick overview. You can actually figure out memcache’s big selling point just from reading the name: memcache caches in memory, making it faster than other caching solutions (like caching to the file system). It also supports connections over TCP, so you don’t have to cache to a local machine; you could have a memcache server running on your network and have your webserver cache your content there. Of course, running it locally on the webserver itself will certainly perform better since you’re not going over the network to store and retrieve information. But, you lose the benefit of having a central cache. However, that should only matter if you have a distributed group of servers that need access to the data, and going over the network isn’t going to slow you down that much. For our uses, running memcache on the webserver was the way to go, but for larger companies with a more distributed system it would probably be better to use a dedicated machine.

Using the PHP library for memcache is really simple; adding data, deleting data, and retrieving data couldn’t be easier. Every piece of data that you cache is keyed so that you can easily retrieve data based on that key. If you need to delete data individually you can do so simply by passing the key in, or you can invalidate the entire cache all at once using the flush() command. This leads me to a couple of the issues I had with the basic library provided by PHP:

  1. No easy way to group/namespace cached data
  2. No way to “get all keys,” or to see what information is currently stored (besides implementing this programmatically yourself—more on this later)

These are certainly not deal-breakers, but they would make living with memcache and PHP a little bit easier. For example, while memcache does not support groups/namespaces by default, you can simulate them without too much hassle. I ended up writing a wrapper layer around the bare-bones PHP library to add this functionality. I also stumbled upon something called memcachefs (albeit well after I had written my own fs caching scheme for debugging…), which lessens the annoyance of the second problem by allowing you to mount your memcache data locally and view, edit, add, or delete data as if they were right there on the file system. Since there doesn’t seem to be an easy way of querying memcache to see what is currently cached (or at least I wasn’t able to find a way), it can be a little tricky to develop with.

In my wrapper layer I implemented a few methods that also allowed me to invalidate grouped/namespaced data all at once without affecting other caches. This basically just gives me a little more flexibility and granularity with managing the cached data. I went through a lot of trial and error and flushing the cache to make sure that things were implemented correctly, and even went as far as implementing my own file system caching to make sure that things were working the way I thought they were. If you’re using Perl you could check out Memcache-Managed, which could save you the trouble of having to implement some of the group/namespacing stuff I had to (but how much fun would that be?).

Why does all this matter? First off, many of the pages on our Web site (//www.logicsupply.com , in case you were wondering…) are pretty static; that is, the content on the pages doesn’t change all that often. Some of the pages go days or even weeks without changing, so caching the data just makes sense. Why hit the database when we don’t have to? The added speed is also welcome, as some of the processing we do can take a relatively large amount of time (we’re talking about seconds here, or even milliseconds, but it’s never fun to wait for a page to load). Reducing CPU load on the server is also never a bad idea. We could use the extra cycles to do good in the world, like run Folding@home or something.

All in all, memcache is a very sweet solution for our needs, and the few minor annoyances with the PHP implementation were just that; minor. The flexibility of memcache coupled with the performance gain shoud far outweigh any minor inconveniences in implementing the solution.

If you have any specific questions or comments about memcache, please leave them in the comments below and I’ll do my best to share my experience.

Comments (2)

  1. November 5, 2007

    Hey! 🙂 I shop here occasionally, found the blog, and can’t help but comment on a few things.

    memcachefs) Be super careful about using this one, in fact I’d highly recommend against it. It’s a nice proof of concept but real world performance would be terrible.

    A few reasons after a cursory glance at it:
    1) It doesn’t appear to handle nonblock very well. Get two queries going at once and you might be screwed.
    2) It relies on libmemcache (not the recent libmemcached), which does totally awesome things such as call ‘exit()’ and ‘assert()’ during some error conditions.
    3) The “directory listing” it does uses an internal slab debugging command. It’s very slow and truncates the data it displays, so if you were to rely on it and your data became larger than a couple dozen variables, you’d be screwed.

    I do kind of wish there were more extended memcached libraries now 🙂 Namespacing, lists, troubleshooting, etc, are all common enough operations that many good examples should just already exist.

    Consider it a good thing we’ve resisted adding a keys dump for so long! 🙂 Like sometimes when learning a new language, you have to adjust your expectations a little. The common debugging idea might be to dump all keys and inspect what’s there, but it’s just as easy (albeit different) to add a debugging mode to your abstraction to count hits/misses. Must more flexible, and performant!

    Memcached is fast because it lacks many database concepts (such as multiversioning), so the idea of doing a “Table scan” on potentially millions of keys is something we would prefer folks not build into applications by accident. Imagine how grumpy people would be if they built their fancy new toy with that “super scalable” memcached thing and it _still_ broke after they got a half million keys in it 🙂 Tsk tsk!

    That being said, I believe ‘tag’ support is actually on the way this time, which should help with namespacing complexity.

  2. david
    November 5, 2007

    Hey Dormando,

    Thanks for the mentioning the caveats with memcachefs. I’ve only used it a few times during development and it does seem very unreliable. When it works, it seems to work OK, but it often just refuses to mount altogether. Seems like maybe it could use a little more time in the oven…

    As for my complaints about memcache in general, you make some good points; here at Logic Supply we’ll end up having a relatively low number of cached items so doing a “table scan” would probably not adversely affect our performance, but I can certainly understand how it would on sites with much larger caches. Like you mention, if necessary it’s not that difficult to add some sort of key tracking or reporting in a wrapper layer. I’m just lazy by nature so I in general I just want everything to work the way I expect it to 🙂

Leave a Comment

Your email address will not be published.