Skip to content
Dipack's Website
GitLabGitHub

Tracking you, privately

tracking, cookie, serverless, privacy4 min read

A while ago, we had a member of the Wikimedia foundation, Nuria Ruiz, come in and talk about how they track unique visitors, and other related metrics, all without actually storing any PII on their systems. Around the same time, I'd just started working on this website, and was looking for ways to do exactly what she spoke about, so this was a fortuitous coincidence!

The gist of the talk was mainly centered around using a cookie called Last-Access, which is basically a timestamp of the last time a particular device accessed their site. Pretty obvious from the name, really. Reading the Wikimedia foundation's Cookie statement 

, reveals that they do have more cookies, and information, they collect from a device's activity, but they don't actually track users cross-device, which is awesome! And yes, there's a cookie called WMF-Last-Access which functions exactly how I remember Nuria explaining it, so that's always good to confirm.

With this information in hand, I decided to try and implement cookie-based tracking following the Wikimedia foundation's example.

First, considering this is a Gatsby site, which is based on React, I had to figure out how to query and manipulate cookies using React, which is something that I'd never done before -- admittedly, I'm not a front-end expert, as I hate CSS. As with anything in NodeJS land, I found a widely used library universal-cookie 

, which allowed me to manipulate cookies in an object-oriented fashion, without ever having to touch the DOM directly.

Using this cookie library, I was able to quickly create a React component that I embedded in the footer of my website, which is rendered on each page. This component, on load, would query for the existence of my tracking cookie, and if it couldn't find one (in case of first load), it would create one in the vein of WMF-Last-Access, such that it would contain the current timestamp, rounded to the nearest hour, for More Deanonymisation. If a device had visited my site at some point, then on the next visit, we would simply update the timestamp in the cookie with one that is more current, counting this a completely new view, but only if the last visit had been more than an hour ago!

Next, now that I could store the last time a particular device accessed my website, I had to find a way to actually track visitor metrics based on website access. I opted to use a simple NodeJS function that would be invoked each time a tracking cookie was updated or created. This function would be given the timestamp stored in the cookie, and would update a simple MongoDB collection hosted by MongoDB Atlas 

. I decided to go with MongoDB, as I had:

  1. No need for a relational store,
  2. Experience with MongoDB, and
  3. Atlas' free tier had more than enough storage!

The aforementioned NodeJS function would run as a Vercel Serverless function, hosted in the San Francisco region, and would be invoked by the tracking script on my website. I decided to go the Serverless route for two reasons:

  1. To avoid exposing my metrics collector implementation details, like the MongoDB connection string, and
  2. To be able to use CORS, as a way to prevent abuse of the tracking script, such that malicious actors could not invoke my function, poisoning the metrics collected

With that said, I'm certainly not an expert at CORS, and this was my first foray into configuring CORS for a "server" (geddit? Because I'm actually running a Serverless function!), so I'm just going to assume the Vercel docs are up-to-date and my function is protected adequately.

Here's a pseudo-code-ish version of my new tracking component:

import Cookies from 'universal-cookie';
export default function Tracking() {
const cookieCtx = new Cookies();
useEffect(() => {
const cookieDomain = 'dipack.dev';
const cookieName = 'Last-Access';
const cookieProps = {
path: '/',
domain: cookieDomain,
sameSite: 'strict',
};
const existingCookie = cookieCtx.get(cookieName);
if (!existingCookie) {
const lastAccessTime = roundTimeForCookie(new Date()).getTime();
cookieCtx.set(cookieName, { lastAccessTime }, cookieProps);
// Sync using the Serverless function.
syncLastAccessTime(lastAccessTime);
} else {
const now = roundTimeForCookie(new Date()).getTime();
const { lastAccessTime: lastAccessTimeExisting } = existingCookie;
// Check to see if we should count this as a new visit.
const newLastAccessTime = getNewLastAccessTime(lastAccessTimeExisting, now);
cookieCtx.set(cookieName, { lastAccessTime: newLastAccessTime }, cookieProps);
syncLastAccessTime(now);
}
}, []);
return <React.Fragment />;
}

After all this finagling, I finally had a privacy-respecting, tracking method that would allow me to measure the (un)popularity of my site, and that allowed me to dabble in the world of cookies and CORS, and I'm proud to say that it works!

© 2024 by Dipack's Website. All rights reserved.