This week, my work for our top secret web app, has mainly revolved around caching. This is work I really enjoy. I can sense the user experience of future customers improving each time I create a useful cache of information that speeds up a page’s load time.
How to do it
Caching is easy, the way I do may not even be the easiest but here goes:
Step 1:
Create a readable, writeable, executable folder (folder permissions set to 777) – preferably called ‘cache’ or similar. You should be able to do this by right-clicking on the file in your FTP client.
Step 2:
Put this function somewhere that will be called whenever you need to cache something:
/*
removeHours()
return the date given but with the given number of hours removed
*/
function removeHours($date, $hours_added){
$total_seconds = 3600*$hours_added;// 3600 seconds in an hour
$date = strtotime($date);
$new_date = $date-$total_seconds;
$new_date = date('Y\-m\-d H\:i\:s',$new_date);
return $new_date;
}
This returns the time/date minus however many hours you choose.
Step 3:
Create the actual cache file.
/*
cache example
*/
// cache filename variables
$cache_filename = 'object_data.inc';
$cachefile_full_filename = $_SERVER['DOCUMENT_ROOT'].'/cache/'.$cache_filename;
// check for cache, if it exists and is less than 1 hour old grab it
if(file_exists($cachefile_full_filename) && filemtime($cachefile_full_filename) > strtotime(removeHours(date('Y-m-d H:i:s'), 1))){
$object_data = unserialize(file_get_contents($cachefile_full_filename));
} // end if
// cache is missing or too old
else{
// Initialise object
$object_data = new Object($page_filter, '');
// Create the cache for future use
file_put_contents($cachefile_full_filename, serialize($object_data));
} // end else
// Use $object_data variables for whatever you want.
This code checks to see if a cache file exists and if it does, whether it’s time stamp is less than an hour old. If it is, the variable $object_data is populated with the cached object data.
If the cache file does not exist or if it is older than an hour, the code looks to initialise a new object and create a cache with that newly grabbed data.
Step 4:
Ensure, Joe Nobhead can’t see your cached file add a .htaccess to your ‘cache’ folder (or whatever you called it) like this:
<FilesMatch "\.(inc)$">
Order Allow,Deny
Deny from all
</FilesMatch>
This stops anyone just viewing files in the cache folder that end in .inc and people get a HTTP error for their troubles if they attempt to view it.
Why bother caching objects
Our top secret web app (if I keep referring to it as that, will people start to get interested in it, I wonder), makes a lot of identical database calls for each page load. If the data doesn’t change that often, that’s a waste of everyone’s time. It slows down users, it puts pressure on the MySQL database and it needs calming measures.
Implementing this caching technique has reduced the load times of pages by well over half in some cases. The beauty of it is, if data within the database is updated it is relatively easy to delete a file from the cache to ensure no data is out of date – just call:
@unlink($_SERVER['DOCUMENT_ROOT'].'/cache/object_data.inc');
After updating your database and delete the relevant cache file.
Drawbacks of this technique
The way this works so far means, one user will suffer the slow page and multiple database connections before the cache is built (or rebuilt every hour) – which isn’t really fair but every user after them and even themselves (if they refresh) will gain the benefits of a quicker site once the cache file has been built.
Disclaimer
Yes, I know you’re a much better coder than me and your caching technique is far superior to mine. Good, I’d like to hear about it.
If you think my implementation warrants improving from an efficiency or security point of view, I’d love to hear from you.
Thanks for the great tutorial, I am working on a new web app as well and as a designer have almost no idea what I am doing other than a basic knowledge of PHP. This tutorial was well written and easy to follow so thanks!
Umm, nice work, but wahts wrong with using Memcache?
http://www.php.net/manual/en/intro.memcache.php
Why the heck would you use a .inc extension ?!?! That makes no sense whatsoever.
Also, your method won’t work if you run a site with logins. This caching method has been shown elsewhere. . and they all work okay for sites w/o logins.
I would also suggest two things.
Instead of going through the hassle of the writable directory and the .htaccess file, just save it to the /tmp directory (I’m assuming you’re on a *nix server, since you use OS X. Might be a bad assumption). The /tmp directory should by default be writable and is in fact meant to hold temporary files. On top of that it is outside of your document root, so it is not accessible to the web directly.
One other thing. I’ve been burned by strange encoding issues with serializing, caching and deserializing large object structures in PHP. I would strongly suggest you also base64 encode the data before writing it, then decode it after reading it. This will ensure that it is kept in a basic ascii character set. The downfall is that the file will consume a little more file space (25%-30%), but the added data security is worth it, in my opinion.
One last note about using the /tmp directory. Be sure to either use a directory in there or use a common file prefix, so that you can quickly identify cache files for the particular site. This is helpful in case you ever want to do a full cache purge for one part of the /tmp directory but not the rest.
I manage about 10 custom built sites for the company I work for and generally use cron jobs to launch scripts at certain times to do things like full cache flushes. Keeping the files namespaced helps me identify what to delete for one site, without affecting another.
Also, check out this Memcached (http://www.danga.com/memcached/). It was built by the livejournal folks to handle their caching needs. It is extremely fast a scalable and works great with PHP.
Good luck!
It’s better to keep cache in database table or in memecached. It’s faster, and it can be moved easily to some additional server when it’s necessary.
Thanks for everyone’s comments and suggestions. Drew, that is exactly what I was after.
I’ll be sure to check out Memcached, as recommended, but I’m not sure it’ll be applicable to my particular situation.
@Brant: The example I give can quite easily be changed to not use a ‘.inc’ extension, you could use anything you want.
@Drew: I try to namespace my cache files for the exact reasons you mention. It’s easy to loop through a directory and then delete all files with that namespace if required. Sadly, this web app won’t be able to make use of cron jobs (in most cases) but otherwise it’s a good approach and I’ll definitely check into using the /tmp file as the repository.
Good article. I’d be very interested to see some benchmarks that quantify the benefits of this caching method; are these available?
Also, this is nitpicking somewhat, but the “removeHours” function could perhaps be considered redundant. The standard strtotime function will interpret relative time descriptions, so you could just write:
strtotime(‘-1 hour’);
… instead of removeHours([date], 1). To specify the start date, you’d write strtotime(‘-1 hour’,strtotime([date])) or similar.
See: http://uk.php.net/manual/en/function.strtotime.php
The yellow… it’s burning out my eyes!
@Daron, Drew and phil:
Actually Memcache is not so fast after all. I did some benchmarks of various “caching techniques” and was quite surprised that using the hard-disk for caching was considerably (about 30%) faster for reading and about 600 times faster for writing (in a local machine/one machine environment).
Either I did something terribly wrong with my benchmarks, or I’ll probably go with using a Filesystem RAM-Disk for caching (haven’t tried it’s performance, yet).
@Chris Gibson: In terms of benchmarking all I’ve done is to use
microtime()
at the start of each page (on the website) and than at the end of each page convert that figure into the number of seconds it has taken to load the page.An average(ish) figure is that for the site homepage, the load time was around 0.24 seconds and with the caching in place it became around 0.06 seconds. It also reduced the number of MySQL queries from 77 to 7.
It’s worth pointing out the the reason for all these (77) queries was a number of recursive functions used to work out menus and see if certain pages were children of other pages. I haven’t used the caching as a way to speed up objects but rather, as a way to reduce the number of times each page calls the database. I’m not sure how effective my method would be as a way to simply cache an object that makes no use of a database.
Thanks for saving the internet a few microseconds
i like the article… but the yellow… hurts after a while but if you like it well go ahead.
fs cache sucks by default,
try memory storage
Nice article Phil. Joe Nobhead, haha!
I’m gonna hazard a guess that this is some sort of content management system?
@Jeff: I love your inquisitive mind and your educated guess is very warm but it isn’t quite correct. I’ll be posting more details about the top secret web app in the future.
I know I’m late to the party, but if your layout and parent child links are whats bottlenecking your database you have a very different problem.
Your using the wrong data structure and/or mysql queries. Better organization of data in your tables, and temporary tables along with some other SQL magic could provide you with the same effect, while not requiring you to maintain all new code, and take up significantly more space on your HDD with static content duplicating you SQL structure.
If you do it this way the first user wont get dinged with a long wait, your server should run cleaner, and you can cache still something that is called so often, as it should be, so that it can be considered static for the sake of the stability of the machine. Caching is great, but if you don’t fix this problem it will still come back to haunt you in the not too distant future, especially if your “top secret project” does well.
Now I’m not an SQL expert to define exactly how to optimally do this, and will never claim to be, but it is still my 2 cents and I figured it was worth saying.
On the flip side of the same coin, I appreciate the work and simple manner in which you do provide php with a direct disk caching method. I’m probably going to use it to cache objects, mostly because instantiating them also requires a very long XML pull from another site’s API for now(avg I’m estimating is on the order of 5 seconds or so). By keeping the objects “alive” I can also record its state and make smarter code easier for when and how to pull new data from that API. For now the object class is needed (depending on the page) 1-30 times and storing the state in MySQL just feels wrong since I don’t need to query the extra data about the objects state. 5 seconds for 1 pull is bad enough but I really need a solution if this is going to go on 30 times a page load (over 2 minutes wait time). The objects themselves contain very little but their state, the rest is held in tables in SQL for queries, but this separation makes it easier to not duplicate the majority of the data.
I’m looking at memcached as well as redis and other memory based solutions for my data structures, but I really feel my memory might bottleneck eventually so I’m considering a few options, and potentially trying to make it as easy on myself to change directions if need be. So the long way around thank you for writing this. :)