Introducing the cookie

The Stateless Web



For all its marvels, the world wide web is not as "smart" as we may presume it to be. Every interaction you make with a website, from telling a movie site what zip code you are in, to placing a book in a shopping basket on an e-commerce site, is forgotten by the site the moment you navigate to the next page. Technical folks refer to this condition as being stateless. Statelessness simply means that as you ask for subsequent pages on a website, the site has no way of intrinsically knowing the long path you took to get to the newly requested page or how you interacted with the previous pages. As you may imagine this could create some pretty severe limitations, limitations that the cookie was created to correct!

Lou Montulli, The Father Of The HTTP Cookie



In 1994 Lou Montulli, an engineer working for Netscape, noticed what an irritant this lack of state became for online shopping applications. How could one be expected to shop online if everything had to happen on the same page? If only there was some form of reminder that would allow the check out page to know that on some previous item description page you had placed a pair of swim shorts in your shopping cart.

The Cookie Is Just A Reminder



The problem of statelessness was answered by the introduction of the cookie. The HTTP Cookie was introduced into the Netscape browser in 1994. At its core a cookie is simply a reminder - sort of a post-it note for websites. It allows a site to send a request to a browser asking said browser to write down small bits of information in the form of key/value pairs, e.g. zip=90210 or favoriteteam=ManU, that the site may find useful for the browser to remind the site next time the browser visits the site. In practice such a reminder would come in the form of a piece of text sent by a browser as part of the request it makes to the website. Technically this piece of text is called a HTTP header and is not dissimilar to many other such headers which tell the site other pieces of information about your browser such as: what your default language is, what type of browser you run, what type of content you can display, etc. To clear up the mystery, this reminder header, the Cookie which is sent by a browser to a server and which is the heart of all the privacy debate looks just like this:

Cookie: zip=90210

That is it; this is really all there is to a cookie. As you may have guessed, this example may be the very reminder a movie site would ask a browser to set and return after a user has told the site that they live in the 90210 zip code. The reminder of course doesn't have to be zip=90210. It can be any key/value pair of the site's choosing. Some illustrative examples may be: age=17, sex=male, zip=90210, emaddress=johnsmith@gmail.com, favcolor=blue, id=abc123. The possibilities are limited only by what the site knows to ask (limited primarily by what you told it or what it might derive about you from the other headers or the content you requested) and the rules the browser has in place. Browser rules with respect to cookies are typically that a single site may set no more than 50 individual cookies (name/value pairs), this 50 is held in a last in first out order. Further, the overall size of each cookie will not exceed 4kb (about 4 thousand characters), but as a practical matter sites tend to issue few (under 10) relatively small (10s of bytes) cookies. Sites tend not to issue large cookies both because it is often easier to use the cookie to reference data than to store it directly. We explore this concept in Storing data with a cookie.

Rules Cookies Follow

So cookies are just reminders, but on the highly organized world of the web even reminders must follow rules. We have discussed some of the rules for cookies - that each site can only have 50 at a time and that each cookie must be under 4kb in size, but there are a few more important safety precautions web browsers have put in place for cookies. Other important considerations for cookie include scoping and expiry.

Scoping

When a website asks for a cookie to be stored and later returned to that site it assumes (correctly) that the reminder is a secret kept between the browser and the site. This secret would be compromised if the browser sent the reminder with every request to every site it saw! This is, therefore, not how cookies work. Cookies are scoped so that they can only return to the same domain that sent them and typically to the same site (for purpose of clarification www.example.com is a site and *.example.com is a domain). What this means is that a cookie set by www.webpublisher.com is usually returned to only the site www.webpublisher.com, but technically could be set to return to sports.webpublisher.com or any site of the form [anything].webpublisher.com, but never to www.sports-webpublisher.com. Browsers are very, very strict about this and don't make what we would consider obvious connections by letting a cookie for e.g. www.webpublisher.com be replayed to www.webpublisher.net. This correctly prevents data stored in a cookie for e.g. your bank being replayed to e.g. your newspaper (and vice versa).

Expiry

Additionally reminders need to be valid for only so long. While our example browser may live in 90210 today, he may not do so forever, meaning that the site may want to have the reminder come back for only so long. This is handled by special Cookie attribute called expiry which is defined when the cookie is set. By setting an expiry a site may say, "keep resending me this cookie until [specific date in the future]". The specific date is typically bound by a 30 year limit (which seems a bit absurd given the average lifespan of today's computers let alone cookie deletion). If any expiry is set, the cookie becomes known as a persistent cookie (even if the expiry is set 10 minutes n the future). If no expiry is set the browser will consider this to be a session cookie and replay the cookie only until the browser is shutdown.
As a practical matter far too much is read into the expiry of a cookie. If a site sees 99% of its cookies again within a matter of days and resets them each time it sees them, then it may set an expiry of only a week or a month but still have what amounts to a cookie that never expires! Additionally a "session" cookie may in practice live longer than a "persistent" cookie with a 1 day expiry if you tend to leave your browser open for longer than a day!

For The Technically Inclined

More information about Cookies can be found in both RFC 2109 and RFC 2965. (Both are scintillating reading and should be considered indispensable for anyone with insomnia.) What is important is that the cookie never actually became part of an official HTTP specification recognized by a standards body like W3C or IETF which has left browser manufactures particularly leeway in how they treat them. This website therefore speaks to how Cookies have been implemented in popular browsers more than the specifics set forth in the RFCs.