Web Privacy Information

Compiled by Doug Reese, Lehigh University
Bethlehem, PA (US).

Background

On Monday, June 9th, 1997, the Lehigh Valley newspaper The Morning Call ran a front-page article entitled, "Web Users Seen Vulnerable To Spying," which was based on an Associated Press story by David Kalish. The story described a press release from the Electronic Privacy Information Center (EPIC) prior to hearings being held by the U.S. Federal Trade Commission (FTC) as part of its Public Workshop on Consumer Information Privacy, held June 10th to 13th in Washington, DC. WAEB-AM News Radio decided to do a follow-up investigation on this story, and reporter Lisa Alexander asked to interview someone from Lehigh University who was familiar with some of the technical issues involved in this debate. I got nominated (and/or volunteered), and I put this page together as a convenient place to bring together various notes and sources.

Disclaimer
I have tried to make sure my facts were accurate and complete. However, this page contains a substantial amount of opinion as well; all opinions are my own and do not necessarily reflect nor agree with those of my employer (or anyone else, for that matter). Any mention of a trade name or product should not be construed either as a challenge to the rights of the owner of such trademark nor as an endorsement of the product or service.

General Questions

The article raised a number of issues. People who spoke to me about the article raised several more, which they saw as related. Many of the questions made it plain that there are certain misconceptions about some of the things which were discussed in the article.

Q:

Should People Really Worry About This?

A:

Depends on exactly how you ask that question. Is there anything to be concerned about? Should people care about this issue? Yes. There is the potential for information to be gathered and used in ways people might not be happy about. There are also things that people can do to protect themselves from this possibility, and they ought to know about those things. Moreover, they should think about the issue, and decide how much information gathering they feel is reasonable, and how such information might safely be used, and what standards and restrictions ought to be imposed and by whom. Then, if they feel the issue is important enough,they should express their opinions to the people who are trying to determine what should be done.

Should people be alarmed? Should they avoid using the web until these issues are resolved? Should they lobby to have some of the dangerous technologies made illegal? No. These technologies exist for a reason, and have as much potential (if not more) for benefit as for harm. The question is not whether they should be used, but how.

The issue of privacy on the web is just one facet of the much larger issue of privacy in general. The problem is not technological, but social, and a purely technological fix simply won't work. It is important not to overreact. There are many reasons for gathering information that are perfectly reasonable and legitimate, and ways of using that information that are very worthwhile. To ignore this is to "throw the baby out with the bathwater," as my father is fond of saying.

Besides, it may be that part of the problem lies in faulty expectations on the part of people who do not understand what the web is all about. The great promise of the web is "worldwide information and services at your fingertips"; does that sound private to you? Sounds to me more like window shopping on a busy city street, which is not a situation I associate with privacy. Maybe the fact that you can do it from your living room in your pajamas is deceiving people.


Cookies

One of the primary causes for concern seems to center around the mechanism referred to as a "cookie." Some of the fears being expressed about cookies are based on confusion about just what cookies are, or about what they really can do versus what they clearly can't, and involve a certain amount of exaggeration or oversimplification.


Q:

What Is A Cookie?

A:

A cookie is a small piece of information that a server can ask a client (your web browser) to store for it. This information is in the form of a simple variable; in other words, an object with a name and a value. Both the name and the value are just text strings, like "SESSION-ID" or "HQVX-5282#AA324". Notice that there is no requirement that either you or your browser will have any idea what this variable represents, or what its value means. As long as it means something to the server the next time it sees it (when the browser returns the cookie to the server), the cookie has done its job. Of course, there's nothing that says the cookie has to be mysterious, either. It could, for example, be "USER-NAME=John-Doe", which is reasonably self-evident.


Q:

What Does A Cookie Do?

A:

The specifications for cookies describe them as a "state management mechanism". What this means is that, basically, a cookie is just a memory aid for a web server. Like many other technological developments, cookies are a tool that is intended to provide the solution to a problem. The problem, in this case, is that the protocol used for web transactions (HTTP) is "stateless". Because of this, even though your travels through a particular web site may seem to you to blend smoothly into one continuous series of interactions, the web server you are visiting doesn't see you that way at all. Each page you view produces a separate request to the server that is unrelated to any other page you may have seen. (In fact, every image on the page also produces a separate request that is not related to the other images on the page or even to the page that it is on!) And your requests are mixed in along with everyone else's.

This means that the server doesn't intrinsically know what other pages you may have already seen, (although it could try to create a database to keep track), or whether you will ever make any additional requests (since you didn't "log on" to that server, you don't have to log off, either, and so the server never has any way of being sure you are finished with it). In computer terms, you don't have a "session". What cookies are intended to do, basically, is to help the server create the illusion of continuity, so that your interaction with a web site behaves more the way you intuitively expect it to.


Q:

So What Are Cookies Used For?

A:

Cookies can be used in a lot of different ways. Because they are relatively new, there are probably uses that no one has even thought of yet. Some examples of typical uses include:

  • Website Personalization
  • Online Ordering
  • Targeted Marketing
  • Site Tracking

Website Personalization: Many websites strive to provide useful information to their visitors, but anticipating their audience's needs is sometimes difficult. By allowing users to identify themselves and indicate their preferences for what type of information is most interesting or important to them, sites such as My Yahoo! and GIST can customize their presentation of information in a way that maximizes its usefulness to each individual vistor. Cookies can be used to store the user's identity (username) or even the preferences themselves.

Online Ordering: Gathering the information you need in order to make a decision to buy something often happens in stages. You may browse a product catalog, adding things to a virtual "shopping cart," which you may eventually decide to purchase (at which point you will need to enter billing and or shipping information). Cookies are often used as a way of creating such "virtual shopping carts."

Targeted Marketing: This is probably the area that causes the greatest concern. It includes a number of relatively innocuous types of activity, ranging from making sure that you do not see the same ads over and over again, to helping advertisers gather statistics on how effective their advertising campaigns are. It potentially includes (under certain specific conditions, which require your assistance) unsolicited mailings of advertising material direct to individual users. The key to keeping this in perspective is to remember that the server can only ask the browser to help it remember something it already knows, and that the browser is not required to comply. (More about this below.)

Site Tracking: Even if they aren't actually trying to sell something, web site designers still want to know whether or not they have been successful. How popular is the site? Do visitors stop at the front page, and then leave the site altogether, or do they stay and browse through it? Are they bookmarking pages? Are some pages less useful than others, indicating that they may need to be redesigned? Because of the fact that visitors don't have sessions, keeping track of how people actually use your site can be very difficult. Cookies can help.


Q:

Why Would Anyone Want To Be Kept Track Of?

A:

Suppose you are a person who loves to read, and you happen to have a neighborhood bookstore nearby that you visit frequently. As the sales clerk gets to know you, he can do small things, like greeting you by name, that make you feel welcome, and make your visits just a bit more pleasant. If you buy a particular out-of-town newpaper every Friday, he can anticipate this, and have it ready (or even set one aside, so they don't run out). If he knows you have been reading several books in a series, he can let you know when the next sequel is due to be published. Based on the books you've bought, and on what you've commented about, he may even be able to suggest a book you might like. Basically, you get better service, because he knows you. Even if you happen to be the sort of person who prefers to browse without being bothered by sales clerks, and who would consider a clerk's presumption in making reading suggestions to be a rude invasion of privacy, the clerk has to know you in order to know that. After all, not everybody feels that way. If the clerk knows you don't like to be bothered, and he still does, then, of course, that is rude.

The point here is that it isn't the information itself, it's what gets done with it that matters. Web sites are not people; they are computer programs. Most people aren't comfortable dealing with something that's completely cold and impersonal, and because computers don't have the sort of everyday things we take for granted in people--like common sense or intuition--it's hard for them to be anything else. Web sites are potentially worse than most computer interactions, because they are, in effect, constantly forgetting who you are from minute to minute. It's as if you were dealing with a clerk who had an advanced case of Alzheimer's. By keeping track of you, a well-designed web site can do better than this, and cookies are one tool that can be used to help the computer keep track.


Q:

But Aren't Cookies A Big Risk?

A:

Is there a risk that cookies will be used to keep track of things you'd prefer weren't tracked? Yes, there is. Wouldn't getting rid of cookies altogether prevent this? No, because cookies are just one tool that can be used to make certain kinds of tracking easier to do; they aren't the only one, and cookies alone aren't what makes tracking possible. As I mentioned earlier, cookies are a solution to a problem; if you get rid of cookies, the problem still remains to be solved, and some tool will be used to solve it. If not cookies, then something else.

Besides, keeping track of you, per se, is not the issue. Despite how many of us may feel at times, advertising and marketing aren't inherently evil: after all, there are times when you have something you want to buy, and if you don't know about it, you can't buy it. The problem is the constant buzz of commercials that are irrelevant to our needs, the flood of useless information in the form of junk mail, the intrusive nature of direct telephone marketing. Think about it: if the people who were trying to sell something actually knew exactly what people wanted in a product, knew exactly who wanted it, knew when they were ready to buy it, and used that information responsibly, things could be nearly ideal for both buyer and seller. You could have nearly perfect service and incredibly efficient commerce.

The rub, as they say, is that part about using the information responsibly. Marketing can be seen as more than just identifying the people who are ready to buy what you have to sell: it can also involve convincing the uncertain, or changing the minds of those who aren't interested. And that's where many people start to feel manipulated and intruded upon.


Q:

What Information Can Be Tracked?

A:

Keep in mind that cookies are a tool for helping the web server remember things. Obviously, the server can only try to remember something it already knows. Notice that cookies have nothing to do with what the server knows, but only with keeping track of it.

There are a few things the server has to know, in order to communicate with your browser. Like the network address (called an IP number) of the computer it should send pages to, when the browser requests them (this may be a gateway, or proxy server, or the machine you are connected to at your Internet Service Provider, or it may be your PC, if that is on a network). Without this, you would get nothing.

There are other things it is useful, but not absolutely necessary, for the server to know. Your browser may tell the server some of these, without you realizing it. For example, most browsers inform the server which browser, and what version number of that browser, you are using, along with what type of computer you are running the browser on. The server could (but often doesn't) use this information to make sure you see a version of the page that uses features that are compatible with your setup. If you are following a link (as opposed to typing in a URL directly, or using a bookmark), many browsers will tell the server which page the link was on, so that it will know where you came from.

Then there is information that the server assigns to you. This is information created and used by the server for its own purposes; it isn't really personal information. There are lots of analogues to this in the real world: if you go bowling, and you check out a pair of shoes, there may be a number stenciled on them--what does this number mean? It identifies which pair of shoes you are using. The web server can assign numbers like that, too.

Most other types of information are only available to the server if you explicitly provide it. This includes things like your real name, your social security number, your postal address, your telephone number, and so on. The browser doesn't know any of these and so can't tell the server without you knowing about it. So the only way the server can connect your activities on the web with any of these things is if you tell the server who you are (usually by filling out a form). Once the server knows who you are, it can keep track of you as long as you remain at that site (or, using cookies, if you return to that site before the cookie expires).

Finally, the server can collect separate pieces of information and examine them to look for patterns, in effect creating new information. If these patterns contain, or can be linked to, personal information, then the potential for abuse does indeed exist. Once again, I remind you that it isn't cookies that makes it possible to do this; much of the information is also automatically collected in server log files, and the information from the forms you fill out could easily be collected in a database on the server. The problem of abuse isn't a technological problem, it's a social one. Getting rid of the technology isn't the solution; setting controls on how it may be used is.



Recent Events

Information About Cookies

Internal Variables Displayed

Privacy Organizations and Initiatives

Information From Users of Cookies

Tools