mod_perl caching

"All programming is an exercise in caching." - Terje Mathisen

Summary

This document covers different details of web application level caching. That is, caching data within a mod_perl process or between mod_perl processes. It does not cover proxy/caching server details, browser caching, etc

Who caches

Just about everything to do with getting performance out of computers has to do with caching. Even below your application layer, we have the basic hardware layers of:

Disk
Main memory
Translation Look-aside Buffer - TLB (maps virtual to physical addresses)
Level 2 cache
Level 1 cache
Processor Registers

When we consider the web, there's many, many layers:

Browser - page stored locally on the users computer for a given time
Remote Proxy - page saved in a proxy server (eg ISP) the user connects through
Local Proxy/Accelerator - page saved/buffered in proxy server/accelerator that the web-site administrator has inserted between their backend and the outside world
TCP network buffer - small pages might exist entirely in one TCP packet in transit
Internal program caching - the focus of this article, lots of sub-layers

What can you cache

What we're dealing with here is data caching within a mod_perl process or between several mod_perl processes. Even then, we can consider several types of caching:

Cache the result of some long running calculation
Cache the result of some slow SQL query
Cache some fragment of the HTML page
Cache the entire generated HTML page

In each case, the general aim is reduce the amount of time required to perform an operation by using more memory or disk space to store the previously calculated result.

A quick process versus threaded web-server summary

There are two main ways to develop a web server (though hybrids are possible):

Multi-process
Multi-threaded

In a multi-process system, a number of individual processes are forked, each of which randomly handles any new incoming request. In a multi-threaded system, there is one process, but it has an internal set of threads, each of which handles a new incoming request. Basically you can summarise the differences as follows:

Robustness
- Multi-process
  Since each process is completely isolated, if any process crashes, no other processes are affected. A new process can be spawned to take the place of the crashed process. Some people might regard this 'robustness' as a false one, because it basically you're hiding programming errors and problems that should really be fixed. On the other hand, other people might use the 90/10 rule to say that if a process crashes once a day, it's not worth the possible 1000's of hours of effort that might be required to track down the one bug. Of course all this depends also on the stability of the OS you use, but generally most Server OS's are extremely stable these days.
  One other advantage of multi-process is that each process is limitable. You can control system resources on a process by process basis, if one process ends up using too much memory, too much CPU, etc, it can be killed automatically, but again, no other processes are affected
- Multi-threaded
  Since multiple threads run within each process space, if one thread crashes, or overwrites memory, the entire process and web-server can crash. You also have to be careful about race conditions with threads writing data over each other. Some of these issues are particularly hard in C/C++, but much less so in a language like Java which has much better support for threads and synchronisation, and also stops the ability of threads to 'crash' (throws catch-able exceptions instead). Of course this depends on the stability of your Java environment, something more people might question.
  Generally it's harder/impossible to have per-thread limits similar to the ones described above for processes.

Data sharing
- Multi-process
  Since each process is isolated, no data, open files, sockets, etc are shared implicitly (except read-only pre-forked data, see way below) To share any data, you have explicit code the sharing in some may. Possible solutions include: sockets, streams, files, shared memory, memory mapped files, etc. Each each case, you also have to provide some locking mechanism to ensure that two processes don't try and write to the same shared area at once.
- Multi-threaded
  Since each thread runs within the same process, all data, open files, sockets, etc are shared implicitly. You do still have to provide some locking mechanism to ensure that two threads don't try and write to the same shared area/item at once. Generally intra-process locking methods are faster than extra-process locking methods.

Performance
- Multi-process versus Multi-threaded
  Depending on your OS, the difference between swapping process contexts and swapping thread contexts might be a little, or it might be a lot. In general (though I don't have evidence on how true this is), Windows is much quicker at thread swaps than process swaps, while for Linux, threads and processes are basically the same thing.
  However even below this, there is another issue. Basically all processors have a TLB which maps virtual process addresses to physical addresses. However the virtual to physical mapping is different for each process. Thus, when there is a process switch, the TLB must be flushed. For a multi-threaded program, this is not required on a thread swap because all threads have the same virtual to physical mapping. This can be an issue on a heavily loaded server which has to swap processes/threads a lot.
  As indicated above, because of the difference in amount of data shared, this can have a big performance implication on how your application is coded. Generally, if lots of sharing is required, multi-threading will be a considerable win over multi-process.

Apache server model

Given the above discussion, Apache 1.x uses a traditional multi-process model to handle web requests (except on Windows). Apache 2.0 will have a variable hybrid model that allows multi-thread, multi-process or some in-between combination.

For the moment, this discussion concerns multi-process caching.

How can you cache

As discussed above, caching involves saving the result of a complex calculation/slow query/etc based on the assumption that another web request will soon require the results of that query. Here's a really basic example of what we might do:

use vars qw(%Cache);

sub GetSlowQuery {
  return $Cache{SlowQuery} ||= $dbh->selectall_arrayref('select slow_query_view');
}

Which basically runs the query once, and saves the result in the %Cache hash. If we call GetSlowQuery again, it will retrieve the result from the hash rather than running the query again.

A couple of important points to note:

If the result of 'select slow_query_view' actually changes, then this won't work because it will always use the saved result of calling the query the first time
If GetSlowQuery is called for the first time in each sub-process, then each sub-process will have a copy of the same data rather than the data being shared between processes.
There is no limit to the amount of data that is shared, or the time it is shared for. Each entry into the %Cache hash lives for the life of the process

Lets say now that we have some module that allows you to store data in some shared memory. We would rewrite the above code to be:

use Cache::SharedMemory;
use vars qw(%Cache);

tie %Cache, 'Cache::SharedMemory';
sub GetSlowQuery {
  return $Cache{SlowQuery} ||= $dbh->selectall_arrayref('select slow_query_view');
}

In this case, the Cache::SharedMemory does some magic to make the %Cache hash look like a hash, but internally it uses shared memory (locked as appropriate).

A couple of important points to note about this:

Again, if the result of 'select slow_query_view' actually changes, then this won't work because it will always use the saved result of calling the query the first time
When GetSlowQuery is called for the first time in any sub-process, then every other sub-process will use the same data stored in the shared cache.
There is no limit to the amount of data that is shared, or the time it is shared for. Each entry into the %Cache hash lives for the life of the process

Thus these are the two main ways you can cache data:

Within each process
Shared across processes

Within each process

Given that the multi-process model doesn't allow any implicit data sharing, then why share any data at all? These is the equivalent of keeping some sort of global variable, and each process then keeps a separate cached copy of the data. This works reasonably with data is basically static, or not modified by other Apache processes. Note though that this means you end up with a copy of the same data within each process. If you have 50 forked Apache servers, and each cache ends up with 1M of data, this means you'll really be using 50M of data.

There is a way around this problem, that uses the fact that most modern operating systems use a 'copy-on-write' technique to fork processes. What this means is that if you load data into the Apache process at startup before it starts forking children, and that data is only every read and not written to, then the data will automatically be shared between processes. Remember though, that for this to work it has to happen before the child processes are forked, so it means you need to know the common read only data that you want to cache