This week at TechEd Microsoft announce the Velocity project, a distributed in-memory object caching system, which got folks like Dare and ScottW talking about using a distributed caching solution for boosting the performance of web sites. That got me thinking more about the differences between Cache and Session State. Although they seem to be the same, and often caching solutions are used for storing session data, I'm not a big fan of putting session in a cache solution (and I really hate putting session in a relational database, since there is nothing relational about the data). But before I describe my preferred solution, let's define the terms:
Cache (via Wikipedia) - a cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access. Once the data is stored in the cache, future use can be made by accessing the cached copy rather than re-fetching or recomputing the original data, so that the average access time is shorter. Cache, therefore, helps expedite data access that the CPU would otherwise need to fetch from main memory.
Session (via Wikipedia) - a session is a semi-permanent interactive information exchange, also known as a dialogue, a conversation or a meeting, between two or more communicating devices, or between a computer and user (see Login session). A session is set up or established at a certain point in time, and torn down at a later point in time. An established communication session may involve more than one message in each direction. A session is typically, but not always, stateful, meaning that at least one of the communicating parts need to save information about the session history in order to be able to communicate, as opposed to stateless communication, where the communication consists of independent requests with responses.
HTTP session token (via Wikipedia) - A session token is a unique identifier (usually in the form of a hash generated by a hash function) that is generated and sent from a server to a client to identify the current interaction session. The client usually stores and sends the token as an HTTP cookie and/or sends it as a parameter in GET or POST queries. The reason to use session tokens is that the client only has to handle the identifier (a small piece of data which is otherwise meaningless and thus presents no security risk) - all session data is stored on the server (usually in a database, to which the client does not have direct access) linked to that identifier. Examples of the names that some programming languages use when naming their cookie include JSESSIONID (JSP), PHPSESSID (PHP), and ASPSESSIONID (Microsoft ASP).
As the Wikipedia article mentioned, session data is usually stored in a database, which IMHO is the wrong thing to do. So, you may think that I'd prefer to use a Distributed Cache, and Velocity does just that and lists it as one of its key features:
Provides tight integration with ASP.NET to be able to cache ASP.NET session data in the cache without having to write it to source databases. It can also be used as a cache for application data to be able to cache application data across the entire Web farm.
But, IMHO, using a caching engine for session, although better than a database, is still the wrong implementation for the problem. I've mentioned before (but never in my blog), that it seems as though a message solution is a much better implementation for session data. You see, what you are really doing when you writing some data out to session in a stateless system is sending a message to a future version of yourself. Images of Star Trek: The Next Generation episode "Cause and Effect" come to mind. In that episode, the Enterprise is stuck in a time loop, where it keeps get destroyed, until Data sends a message to a future version of himself, and breaks the loop. I learned the trick of using Message Queues for Session Data back in my mainframe days, and I've found that if something scaled for the mainframe, using the same techniques on other platforms is usually the best way. Back on the Mainframe, CICS is the transaction service used in online systems, and works in a stateless manner, very similar to the web. To send data between each instance of a screen, one of the primary techniques is to use a Temp Storage Queue, and a queue is created for each session, based on the session id.
I've always wanted to try to do the same thing with ASP.Net, using MSMQ as the Message Queue, but until MSMQ 4.0 (released with Vista and Win2k8 Server), it really wasn't feasible. Creating a new queue for each ASP.Net session wasn't a simple and efficient thing to do, so I never tried it. With MSMQ 4.0, they have added a subqueues, which are implicitly created local queues that are logical partitions of a physical queue. This way, I can create one or more message queues for an ASP.Net application, and easily have them "indexed" by a sessionid. The downside of using MSMQ is that very few companies have a network admin staff that know how to support MSMQ.
I always wondered why the ASP.Net team never released a MSMQ session provider, so I'm going to have a go at it and see what sort of perf gains I can get over using SQL Server Mode, or maybe even Out-of-process Mode.
The first issue I've run across is that System.Messaging wasn't updated in .Net 3.5 to take advantage of MSMQ 4.0. Reading from a subqueue is the same as reading from a regular queue, but you can't write to a subqueue using the System.Messaging namespace. So, I'll have to implement that myself, and I'll publish the code.