Understanding browser fingerprinting

Posted by on Feb 07 2020, in asylon, Capital Surveillance, Marketing, technology

  ClientVariations proto;
  for (const VariationIDEntry& entry : all_variation_ids_set) {
    switch (entry.second) {
      case GOOGLE_WEB_PROPERTIES_SIGNED_IN:
        if (is_signed_in)
          proto.add_variation_id(entry.first);
        break;
      case GOOGLE_WEB_PROPERTIES:
        proto.add_variation_id(entry.first);
        break;
      case GOOGLE_WEB_PROPERTIES_TRIGGER:
        proto.add_trigger_variation_id(entry.first);
        break;
      case ID_COLLECTION_COUNT:
        // This case included to get full enum coverage for switch, so that
        // new enums introduce compiler warnings. Nothing to do for this.
        break;
    }

 

The code above has been in the Google code base since 2014 and deployed across their Chrome browsers [1]. In total its just a few hundred lines of code (including comments) in a project that contains nearly 35,000,000,000 lines of code.

So what makes this special?

This code helps Google track you are no matter where you go on the internet. That bit of code, which represents less than one % of the entire Chrome project, may as well be a key factor in Google’s billion dollar advertising empire. This was recently highlighted in Github conversion where an Estonian developer questioned the validity of using this code:

kiwibrowser-6127227

This caused stir in the tech community, but made little news elsewhere. This is not something new, but something most people find too arcane to comprehend, and political leaders place in their ‘too hard basket’ to do anything about. This is easily understand.

This code is part of a user tracking process known as digital fingerprinting. Just like your own fingerprint, your digital finger print can uniquely identify you. This information is sought after by advertisers, hackers, and even governments. This type fingerprinting comes from a multitude of sources such as your IP address and browser.

Most people mistake that this ‘finger print’ is just an IP address, and indeed the VPN business has flourished the past few year. However an IP address is just of what they can track.

Besides the misplaced trust in VPNs, there is also some degree of misplaced trust in blocking Javascript on a site including analytics code and other trackers. However, these types of scripts are blocked by default by certain browsers such as Brave.

As of today, perhaps the most vulnerable component to your digital profile is your browser.

An overview of browser fingerprinting

Here is limited list of identifiers that your browsers signals to the website you visit.

  • User Agent – The user agent is a text record identifying the browser and operating system to the web server. These can vary quite a bit and can help your system be identified. Read more about here: https://www.howtogeek.com/114937/htg-explains-whats-a-browser-user-agent/
  • HTTP_ACCEPT – Returns a list of the type of content that is accepted in your browser such as text or html. Read more about this here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept
  • Browser Plugins – Plugin has its name enabled, it will show here. Not all plugins have this.
  • Screen Size and Color Depth – This does what it says on the tin – this is very useful for identifying mobile devices.
  • System Fonts – The fonts on your system can also clearly mark who you are, graphic designers be aware!
  • Are Cookies Enabled? Have you enabled your cookies? Typically a yes/no answer.
  • Supercookie testing – A supercookie is a tracking cookie that telecoms can use to track your online behaviour. They can only be thwart by using the https and/or vpn use. Read more about it here: https://searchsecurity.techtarget.com/definition/supercookie
  • Canvas fingerprint – Canvas is modern HTML element that use for drawing graphics into your browser. These can be used to identify your computers graphic cards. Read more about it here: https://en.wikipedia.org/wiki/Canvas_fingerprinting
  • Web GL fingerprint – Web GL is a 3D is a javascript library for rendering graphics into your browser, it is usually enabled by the Canvas element mentioned above. Similar it can reveal the graphic capabilities of your device.
  • DNT Header: DNT stand for do not track, this is browser header that is widely ignore. Unfortunately, if you have it switched on or off, it also makes you more easily identifiable on the web. By default its not activated (set to null). Read more here: https://en.wikipedia.org/wiki/Do_Not_Track
  • Language – This the language of your browser. If you are multilingual or have different versions of a language (eg: En-US and En-GB), this will make you more unique and trackable.
  • Platform – This is the Operating system you are on (Linux, Windows, IOS, Android, BSD)
  • Touch Support – Identifies if you have touch screen and how its implemented.
  • Date and Time – This seems innocuous, but this can detect if you are using a VPN. For instance if your time zone in Auckland/Wellington and your IP is in Germany, its likely that your time zone gives you away.

There are many ways to check what information your browser sends out, some excellent online tools include browserspy.dkbrowserleaks.comdeviceinfo.me, and panopticlick by eff.

For the sake of ease, I have tested 4 different browsers on my laptop using the metrics Panopticlick tester. The results come from including the ubiquitous ChromeBrave, a chromium alternative; Firefox; and the Tor privacy browser. All these are fresh installs with no plugins installed.

The metrics are “bits of identifying information” – this breaks down browser elements into to component that can identify you. The less bits, the better. The second is “one in x browsers have this value”, here its better to have lower number as well, its better have many other browsers matching your profile. It’s best not be ‘one in million’ as that will make you stand out.

Finally – these may be very specific to your operating system or device. I’ve tested these on a freshly installed Ubuntu laptop, which not going to be as common as a Macbook or Windows Laptop. Please test for your own computer for best results.

Panopticlick “Bits of Identifying Information:”

Browser Characteristic Chrome Brave Firefox Tor
User Agent 13.21 7.89 6 2.62
HTTP_ACCEPT Headers 3.08 3.09 1.75 1.75
Browser Plugin Details 3.06 4.99 7.75 0.87
Time Zone 4.26 4.26 4.26 2.28
Screen Size and Color Depth 2.64 2.65 2.64 6.12
System Fonts 7.27 7.28 6.64 3.27
Are Cookies Enabled? 0.21 0.21 0.21 0.21
Limited supercookie test 0.33 0.32 0.33 0.33
Hash of canvas fingerprint 9.38 9.39 7.23 2.83
Hash of WebGL fingerprint 14.27 8.88 6.92 6.57
DNT Header Enabled? 1.1 1.1 1.1 1.1
Language 0.93 0.93 0.93 0.93
Platform 3 3 3 3
Touch Support 0.75 0.75 0.75 0.75
Totals 63.49 54.74 49.51 32.63

Here Tor browser clearly is the winner, with half of the identifying bits that Chrome shows. The only place where Tor loses out is on screen size and color depth. There is not too much distance between Chrome and Brave, and Firefox sits in the middle of the pack. The User Agent of Google Chrome almost double the information of next nearest browser.

Panopticlick “One in X Browsers Have This Value:”

Browser Characteristic Chrome Brave Firefox Tor
User Agent 9450.43 237.53 64.14 6.15
HTTP_ACCEPT Headers 8.46 8.49 3.37 3.37
Browser Plugin Details 8.34 31.84 214.78 1.83
Time Zone 19.21 19.22 19.21 4.84
Screen Size and Color Depth 6.25 6.26 6.25 69.5
System Fonts 154.05 154.95 99.7 9.65
Are Cookies Enabled? 1.16 1.16 1.16 1.16
Limited supercookie test 1.25 1.25 1.25 1.25
Hash of canvas fingerprint 666.75 671.31 150.52 7.11
Hash of WebGL fingerprint 19760 471.5 121.49 95.25
DNT Header Enabled? 2.14 2.14 2.14 2.14
Language 1.9 1.91 1.9 1.9
Platform 8.02 8.03 8.02 8.02
Touch Support 1.68 1.68 1.68 1.68
30089.64 1617.27 695.61 213.85

Here is where we seeing the nearly shocking amount of uniqueness that chrome exposes. In particular the User Agent, Fonts, and Web GL information. Brave, despite their inbuilt JavaScript tracking, also exposes more than double the information than Firefox and almost 8 time more than Tor. Firefox does well here, it’s only weakness is in the plugins. Tor is again well out in front of all tested browsers, with screen size and color depth being its only weak point.

Getting back to the controversy

Let’s get to controversy about the article – Google announced that they were going to simplify the user agent to make it less identifiable. This is great news, right? Well, the problem is that user’s point out where the following:

  1. There are still many other factors in the Chrome browser that identify you – if you reduce the unique browser factors in the test above to Tor’s values, you would still get extremely strong identifying factors from other browser features.
  2. Google’s very own X-Client data is not revealed directly to anyone but it’s own internal network. This layer of opacity has disturbed many in open source software community and privacy advocates.

Google’s response to this controversy was the following came from an from an unnamed spokesperson:

“The X-Client-Data header is used to help Chrome test new features before rolling them out to all users”

“The information included in this header reflects the variations, or new feature trials, in which an installation of Chrome is currently enrolled. This information helps us measure server-side metrics for large groups of installations; it is not used to identify or track individual users.”

There is much more to be said here. If you are not afraid of learning more, Stack Exchange is a a good resource, and if you are more technically inclined you can browse the source. Here is a summary taken from the chrome source code of all the different Google properties that x-client data is passed to:

  • android.com
  • doubleclick.com
  • doubleclick.net
  • googleadservices.com
  • googleapis.com
  • googlesyndication.com
  • googleusercontent.com
  • googlevideo.com
  • gstatic.com – This how google serves its static files such as JavaScript.
  • ggpht.com – This appears to be part of Google image serving network for Youtube and Blogger.
  • litepages.googlezip.net – this appears to be something to do with lite pages served by Google Chrome.
  • ytimg.com – this stands for Youtube Image.

Putting it all together

The question maybe asked ‘so what’? At the beginning of the movie “The Great Hack” a young woman poses the question:

Maybe it’s because I grew up with the Internet as a reality. The ads don’t bother me all that much. When does it turn sour?

From Cambridge Analytica to Google + Data Leaks, we’ve already seen numerous instances of big data scandals that have turned the world upside down. There isn’t any reason deny that privacy and more importantly, decision rights over our data, should be taken back by the consumers.

Ultimately browser fingerprinting is very difficult to overcome at this stage and time, but there are small things to do that can help.

Things you can do now:

Change your browser – do not use Google chrome. Here is a bit more about the options I tested:

  • Firefox – Out of the box, it is more private that chrome, and with additional plugins such as Ublock Origin can help against additional Javascript tracking.
  • Brave – This is a chromium based browser which by default blocks JavaScript tracker. It is more anonymous out of the box than Chrome.
  • Tor – Tor is a Firefox browser with anonymity in mind. Its not build for speed, but it is built for privacy. Tor recommends not to use any plugins and also, do not use a VPN.

I have not yet tested for other popular browsers such as SafariKiwi BrowserMicrosoft EdgeWaterfox, or Vivaldi.

Change your user agent: You can change the user agent and other settings for your browsers with the guides listed below. These require some tinkering and may void your ‘warranty’. While this may not make you any less unique, it might you more anonymous if you change it to common browsers agent strings.

Develop a personal browsing strategy. This requires some discipline for surfing the net. For example:

  • What browser do I use for work?
  • What browser do would I use for every day use?
  • When should I use the Tor browser?
  • What browser should I use for my Google (Facebook, LinkedIn etc) .

Ask the questions – if you unsure if your software or hardware is doing something that it shouldn’t, ask online, and you can always ask anonymously on places like reddit. There is is no harm in this, and you might be surprised you may not be the only person with your question.

Contact the privacy commissioner: Enquire about what data is being collected by out of country and what policies are in place to protect local businesses and as well individuals.

Further reading:

Footnotes


[1] The original lines of the Google code where pointed by user Ted Mielczarek on StackExchange: https://stackoverflow.com/questions/12183575/what-is-following-header-for-x-chrome-variations/13239916#13239916 . They are as follows:

[2] Personally Identifiable Information – this is key part of Europe’s General Data Protection Regulation (GDPR).