For the last 20 years, cookies have been used in almost every website to identify return user sessions, offer better content or simply monitor user activity online. However, in recent years, users have begun to rebel against what they see as an invasion of their privacy. Some countries are even imposing serious restrictions on the use of browser cookies and are offering more and more options for users to block and delete this information.
Have you allowed large companies to monitor the activity of your users? Not at all!
There are more sophisticated techniques to identify users who browse your website even when they block cookies, but first, we must understand how cookies are used and how they work as web requests.
When you load a webpage (like landing on the home page or browsing around the different sections of a website), the browser is actually sending a request to the server asking for the files you need. These requests are independent of each other, and the server has no way of knowing whether these requests are being made by the same user or different users. In order to track the individual users, cookies were born. A cookie is a small piece of text that the server asks the browser to remember. Anytime the browser requests a file, it also sends all of the cookies that the server has commanded. This makes it easy to keep track of a user. If we put a unique user identifier in a cookie, then all requests for that user will have the same identifier.
So if cookies are now persona non grata, how can we achieve the same results, but without using cookies? Here are 5 ways to identify your users:
1. Using the user’s IP
Using an IP address is the most obvious solution of all. It is simple and fast, however it is also the least effective. The main problem with the using the IP is that the vast majority of users use dynamic IP, meaning a user’s IP address today may not be the same one tomorrow. Also, if multiple users connect to the same network, everyone will have the same IP and they will all be indistinguishable to the server.
Unfortunately, clearing the LocalStorage is also relatively easy for any user concerned about their privacy and incognito mode browsers still make the user undetectable.
3. Canvas Fingerprinting
What happens is a particular pattern of geometries and text is hidden in an area of each page. Due to the particularities of browsers, operating systems and, above all, graphic cards users, the resulting image on two separate computers can be made distinct (although the instructions for generating it are the same).
This image is a fingerprint unique from any other computer. If we pass on a hash created with this image in every request, we can identify requests made from the same computer without storing anything on the user.
4. User Behavior
The above method still has a problem: it can be detected. Advanced privacy tools (like Tor browsers) are able to detect when you are painting in a “suspect” canvas and will send a notification to the user.
The following method is probably the most complex of all and is only available to some technology giants like Google that have the resources to pull it off. However, if implemented correctly, this new methodology can detect if a user accessed your site from another computer!
The technique is to record the behavior of the user (mouse movements, acceleration, the use of the scroll, etc.) and transmit that information to the server. The user behavior is very personal and can be detected by grouping Machine Learning algorithms.
5. Using the ETAG
Finally, a balance between simplicity and efficiency – this is my favorite method. First of all, let’s explain how the cache works on modern servers.
When a user requests file A for the first time, the server then serves up the ETAG file along with a code that is a signature of the file contents. When the user wants to order the file a second time, the browser (which has the file and the cached ETAG) sends to the server with the ETAG. If the file has not changed, the server that has the ETAG is the same one that was sent by the client and the server, instead of sending the file a second time, tells the browser to use the version that is cached. If the file has changed, the server would send the customer the new version with the new ETAG. Simple, right?
So, how can we use this system for our purposes?
The behavior of what we can not change, but the server is ours and therefore we do not meet the established performance (assuming the browser if it will). Prepare an image (from 1×1 to not see) on our home page. The server, when prompted that image alter their behavior as follows:
If the request does not have ETAG (because it is the first time the user logs) generated a unique user ID, and send it to the browser as if the ETAG image.
If the request has ETAG, we know that this ETAG is really our user identifier, so that all requests from that IP for the current session are that user. For the browser not miss the identifier, we will tell you the version you have cached image it is correct
The main disadvantage of the ETAG is to be deleted if the user clears the cache of your browser. However, given the simplicity of implementation, it is a great way to start later and complement other methods
These 5 methods are only a fraction of those used to identify each user session. For better or for worse, overzealous usage of these techniques has become a very active research branch for companies that rely upon user data analysis.
Do you use any of these techniques? A better question is, do you know if any of your competitors use them?