Introduction to Device Fingerprinting
April 2nd, 2019 | By Camilo Reyes | 4 min read
Curious about device fingerprinting?
Customers nowadays are into all kinds of devices. There are screens everywhere, and the average user owns more than one. This makes it hard to know where real users are when using a secure system. With more devices, an app can’t identify a trusted device versus one that belongs to an attacker.
From an attacker’s perspective, more attack vectors can mimic real users.
An attacker is happy with more options, but is there a way to mitigate this? If it is possible to identify both the user and the device, how hard can this be to put in place?
What Is Fingerprinting?
A fingerprint can uniquely identify a trusted user on the internet. This unique fingerprint can depend on both the browser and the device. Multi-factor authentication can verify this trusted device to identify a real user. An attacker will then need more than user credentials to do any real damage to a secure system.
The application has the option of managing a small set of trusted devices that belong to a user. This reduces attack vectors and increases security.
For fingerprints, we’ll pick ClientJS as the open-source library to gather data points in the browser. An open-source library is a good option because anyone can inspect the code for security flaws. To start using this library, do the following:
var client = new ClientJS();
Keep this client JavaScript object in mind. We will come to it throughout this piece. The idea is to focus on the question: How can I identify an actual user behind a screen?
Device vs. Browser Fingerprints
There are two kinds of fingerprints: those that identify a browser and those that detect a device. Knowing which of the two kinds a data point identifies helps in increasing entropy.
A good level of entropy is necessary because it adds uniqueness to the fingerprint. The entropy level sets a differentiator so the app can tell devices apart. Having low entropy means all devices have the same fingerprint.
For factors that target the device, these are some options:
var os = client.getOS();
var version = client.getOSVersion();
var language = client.getLanguage();
var timezone = client.getTimeZone();
var resolution = client.getAvailableResolution();
Factors such as the OS and version tend to remain static per device. The language and timezone vary depending on the physical location of the device. Customers who travel get dinged on location preferences because they change so often.
The available resolution has screen data which changes per physical configuration. If the user is behind a docked laptop, for example, then the screen resolution changes.
For factors that target both the browser and the device:
var canvas = client.getCanvasPrint();
var fonts = client.getFonts();
var plugins = client.getPlugins();
These are hybrid factors because they vary per browser and device. The Canvas API in the browser taps into both hardware and browser capabilities.
Fonts pull up system fonts installed on the device to which the browser has access. ClientJS has a long list of fonts it detects which might affect performance — if the library takes too long, be sure to reduce this list. Plugins are pieces of software installed on the device that the browser can detect.
For factors that target the browser, try:
var userAgent = client.getUserAgent();
The user agent can uniquely identify the browser. One caveat is it changes with every new release of the browser, so it has high entropy. From the server side, devices with identical user agents will send matching headers. One differentiator is the IP address which reveals the location unless the user is behind a VPN.
It is important to know customer behavior around devices when coming up with a list of data points. Knowing device types, travel, and upgrade patterns helps in getting a good list.
With every data point, what you get back is this raw string. Strings are good for humans who know how to read but cumbersome for a computer to process. So, is there a simple way to tell data points apart without matching on raw strings?
To Hash or Not to Hash
Hashing raw fingerprint data allows quick analysis. A computer can store and match a number much quicker than raw string data. This reduces storage and retrieval times from a database and it’s efficient to put in place.
For fingerprints, we only care about matches from previous data, so a number hash is superior. The matching algorithm can handle many data points that only show partial matches. For example, say the resolution matches but the canvas and font data points do not.
With a hash, it’s efficient to do partial matches on all data points and set a threshold. If partial matches are below the threshold, it becomes a new device.
ClientJS uses the MurmurHash3 algorithm to hash fingerprint data. This algorithm returns an unsigned int in a JavaScript number type. On the server, a positive integer type may not be supported if you’re not using JavaScript. So be sure to use the appropriate type on the server that supports this hash value. To learn more about hashing, check out this article on hashing algorithms.
To hash a data point in ClientJS, do:
var canvasFp = client.getCustomFingerprint(canvas);
You have the option to hash data points in both the client and the server in JavaScript. This allows you to have both the hash and the raw string data, if necessary. One caveat is to make sure raw data is secure, so it doesn’t leak any customer PII.
Conclusion
Fingerprints help mitigate attack vectors with the many devices in use today.
Fingerprint data points can uniquely identify the device, the browser, or both. Hashing data points enables fast retrieval and analysis. Raw fingerprints are personally identifiable information or PII, so keep this data secure.
If you're interested in learning how Jscrambler can protect your Web applications, request a demo.
Jscrambler
The leader in client-side Web security. With Jscrambler, JavaScript applications become self-defensive and capable of detecting and blocking client-side attacks like Magecart.
View All ArticlesMust read next
Bots and Credential Stuffing Attacks
In this blog post, we explore how there is a rise in bots attacking organizations using credential stuffing attacks.
May 20, 2022 | By Adhyayan Panwar | 5 min read
Auto-F(a)illing Password Managers
Password managers are a valuable tool for individuals and organizations to enhance their digital security. Dive into a potential security concern associated with auto-filling inputs and explore how...
August 29, 2023 | By Jscrambler | 6 min read