The privacy nightmare of browser fingerprinting

225 comments

·November 22, 2025

aragonite

Some time ago I noticed that in Chrome, every time you click "Never translate $language", $language quietly gets added to the Accept-Language header that Chrome sends to every website!

My header ended up looking like a permuted version of this:

  en-US,en;q=0.9,zh-CN;q=0.8,de;q=0.7,ja;q=0.6

I never manually configured any of those extra languages in the browser settings. All I had done was tell Chrome not to translate a few pages on some foreign news sites. Chrome then turned those one-off choices into persistent signals attached to every request.

I'd be surprised if anyone in my vicinity share my exact combination of languages in that exact order, so this seems like a pretty strong fingerprinting vector.

There was even a proposal to reduce this surface area, but it wasn't adopted:

https://github.com/explainers-by-googlers/reduce-accept-lang...

hoofedear

Is Chrome trying to assume that, since you don’t want it to translate those pages/languages, that you can read them/want them in your header? Interesting

scrollop

PSA Don't use chrome.

SV_BubbleTime

Definitely a good STEP1, but it’s not like Firefox and Safari are finger printing secure.

capitainenemo

Firefox does pretty damn well though, especially with privacy.resistFingerprinting set to true

Alive-in-2025

what about duck duck go? We need a simple chart: 1. What browsers are good at resisting finger printing 2. tell for each browser, does it work on android ad ios and apple and windows and linux 3. what setting are needed to achieve this

for bonus points, is there no way to strip all headers on chrome on control it better?

fsflover

Tor Browser (based on Firefox) is.

datavirtue

I only use it when I want to be tracked.

thaumasiotes

> There was even a proposal to reduce this surface area, but it wasn't adopted:

>> Instead of sending a full list of the users' preferred languages from browsers and letting sites figure out which language to use, we propose a language negotiation process in the browser, which means in addition to the Content-Language header, the site also needs to respond with a header indicating all languages it supports

Who thought that made sense? Show me the website that (1) is available in multiple languages, and also (2) can't display a list of languages to the user for manual selection.

fsflover

Using Chrome and caring about privacy? I thought, after Google killed uBlock Origin, it had become beyond clear these two things were incompatible, https://news.ycombinator.com/item?id=41905368

esseph

uBlock origin just got replaced with uBlock lite for most people

anthk

There's a way to enforce loading UBo in Chromium but you need to download the extension by hand (git clone it from GitHub) and load it in "developer mode" in the extension settings. Also, you need to enable some legacy options related to extensions in about:flags.

fsflover

Which, by design, doesn't protect you from actual spying, https://github.com/uBlockOrigin/uBOL-home/wiki/Frequently-as...

datavirtue

Hmmm...YouTube has been getting confused about the language and displaying random languages for the closed captions on videos. This was happening to me across smart TVs but I access YouTube randomly from various devices and browsers...but mostly Chrome when using a browser.

drnick1

Firefox w/ the Arkenfox user.js is probably as good as it gets in terms of privacy. By default, this config burns cookies on exit, standardizes the time zone to UTC, spoofs the canvas fingerprint, and does other helpful things. Basically, it makes Firefox expose the same information as the Tor browser.

In addition, I block most known advertizing/tracking domains at the DNS level (I run my own server, and use Hagezi's blacklists).

Finally, another suggestion would be to block all third party content by default using uBlock Origin and/or uMatrix. This will break a lot of websites, but automatically rules out most forms of tracking through things such as fonts hosted by Google, Adobe and others. I manually whitelist required third party domains (CDNs) for websites I frequently visit.

samtheprogram

There's no point unless a critical mass of people use these tools. You will be the only one on your IP address using this configuration of masked fingerprinting, which is itself a fingerprint.

That's also why it's indeed useful when using Tor, because you're not identified by your base IP.

Unless we make this part of the culture, you have basically 0 recourse to browser fingerprinting except using Tor. Which can itself still be a useful fingerprint depending on the context.

codedokode

Does it hide GPU name that is exposed via WebGL/WebGPU? Does it hide internal IP address, available via WebRTC?

> block all third party content

It's not going to work, because the fingerprinting script can be (and is often served) from first-party domain.

Also imagine if browser didn't provide drawing API for canvas (if you would have to ship your own wasm rendering library). Canvas would become useless for fingerprinting and its usage would drop manyfold. And the browser would have less code and smaller attack surface.

drnick1

> Does it hide GPU name that is exposed via WebGL/WebGPU? Does it hide internal IP address, available via WebRTC?

My GPU is reported as simply "Mozilla" by https://abrahamjuliot.github.io/creepjs/.

The number of cores is also set to 4 for everyone using this config and/or Tor.

> It's not going to work, because the fingerprinting script can be (and is often served) from first-party domain.

This may be true, but allowed third party content makes it trivially easy for Google and others to follow people around the Internet through fonts delivery systems among others.

tempest_

I had forgotten I was running Ublock origin / Privacy Badger / Ghostry so I was a bit confused with the results from that site.

I think it is Ghostry that is faking the responses but I still have a pretty unique fingerprint according to https://coveryourtracks.eff.org/kcarter?aat=1

dminuoso

If I infiltrate someone else’s computer, secretly run code in order to to exfiltrate data I risk prison time because objectively it seems to satisfy criminal laws over where I live.

How do prosecutors in any modern country/state not charge this behavior when done by a website owner?

gruez

The difference is that there's implied consent to run arbitrary (albeit sandboxed) code when you visit a website. Moreover it's not the website causing the code to be executed, it's your browser. Otherwise if the bar is "code is being run but the user doesn't know about it", it would lead to either any type of web pages with javascript being illegal (or maybe without javascript, given that CSS turing complete), or a cookie banner type situation where site asks for consent and everyone just blindly accepts.

alcide

Orion Browser (Kagi Product) prevents fingerprinters from running by default.

https://help.kagi.com/orion/privacy-and-security/preventing-...

ashman5

Orion browser is also capable of running uBlock Origin (not Lite) on iOS.

codedokode

How do they reliably detect fingerprinting? Did they solve the Halting Problem? Sounds fishy.

gruez

>The only efficient protection against fingerprinting is what Orion is doing — preventing any fingerprinter from running in the first place. Orion is the only browser on the market that comes with full first-party and third-party ad and tracking script blocking, built-in by default, making sure invasive fingerprinters never run on the page.

sounds like they block "known" fingerprinting scripts and call it a day.

capitainenemo

unfamiliar with the Arkenfox user.js but are any of these things that are beyond what firefox enables out of the box if you turn on privacy.resistFingerprinting ? Because what you describe seems to be all stuff it does just by flipping flag.

kachapopopow

All javascript based anti-fingerprinting is detectable and is also a major source of uniqueness!

vorticalbox

Sure but if you are always unique for every website then you can’t be tracked overtime.

HumanOstrich

They meant a signal of uniqueness for your setup that could still assist with tracking, not being unique for every site.

hilbert42

"This will break a lot of websites, but automatically rules out most forms of tracking…"

Whether one breaks a lot of websites or not depends on the type of user one is. People who regularly use the Google ecosystem, Amazon and Social Media etc. cannot afford to break sites for obvious reasons, they too are those that websites are most interested in tracking and fingerprinting.

Those who use the web in the way advertisers and Big Tech intend users to use it are the most vulnerable, they're the ones who most need protection.

I break websites regularly but it doesn't worry me, I browse with the premise that there are more websites on the internet than I'll ever be able to visit and if I break sites or are blocked by paywalls then there are usually alternatives and workarounds.

But then I'm not a typical user, I block ads, I usually browse with JS off, kill cookies, use block lists, use multiple browsers (there are six on this deGoogled, rooted phone), browse from multiple machines—Windows, Linux and use multiple ISPs. Also, I've no Social media or Google accounts and rarely ever purchase stuff online. Internet access is via dynamic IP addresses and routers are rebooted often. There's more but you get the picture.

I assume browsing sans JS makes me a first-class target for fingerprinting and that websites know about me but it doesn't matter. Whatever I'm doing seems to work, over the years I've had very little trouble doing everything on the web that I want to do. Clearly I'm of little interest to advertisers and I never see ads let alone targeted ones. I used to use uBlock Origin but I don't bother now as browsing sans JS is just so effective at blocking ads.

I'm lucky in the fact that I use no service that would benefit from fingerprinting me. Whilst my web browsing is atypical of most users I reckon many could benefit by being more proactive—using multiple machines, browsers, ISPs etc.—to disrupt the outflow of personal data. For example, this is being written on a rooted Android using Privacy Browser from F-Droid sans JS and with block lists. If I really need to go to a site where JS is required, I can simply hit a toggle and turn on JS or alternatively use another browser.

DeathArrow

There is also server side fingerprinting like JA4+ and others. Also, if you somehow evade fingeprinting, you have to prepare yourself to solve some very slow Google and Cloudflare captchas.

0xy

As someone who utilizes these tools for anti-fraud purposes, Firefox is just as trackable if not more trackable than Chrome (especially because you stand out by using a niche browser in the first place).

Firefox exposes a massive amount of identifiable information via canvas, audio device and feature detection methods. There's also active methods to detect private windows, use of the developer console and more.

vpShane

Of course. There's data where there isn't data.

-make client load something

-client doesn't load it

-add.fingerprint.point(client,'doesnltloadthings',1)

-detect if client does something only a certain browser does

-client does it

-add.fingerprint.point(client,'doesthisbrowsderthing',1)

-window was resized/moved, send a websocket snitch to the backend

- keep a consistent web socket open, or fetch a backend-api call for updates on X events - more calls are made, means user is probably scrolling, inject more things/different things.

I see some js obfuscators out there where I look at the js file and it's all mumbo jumbo.

It is indeed a privacy nightmare, where whatever we do feeds the algorithms to aide in making other people do things.

But it's also used in network security, organizations etc. Staff/employees will use the system a certain way, if something enters it without the behaviors, it's detectable. I assume that's what you mean in anti-fraud.

Sad part is we don't know what the data is ever used for, and it's often bought and sold and the cycle repeats.

skaul

Self-plug but if anyone is interested in learning more about how browser fingerprinting works and the different protections browser makers deploy against it, I wrote a longer post about this a few months ago: https://pitg.network/news/techdive/2025/08/15/browser-finger...

NoahZuniga

its consistent cross site, so you get all the same privacy problems as with 3rd party cookies

bolangi

The article is missing links to one of the first fingerprint diagnostic tools, https://coveryourtracks.eff.org/ , formerly called something like panopticon.net.

TechDebtDevin

or don't trust the EFF!

doug_durham

I agree with the points in the article. Fingerprinting of any kind is a major risk for personal freedom. At the same time I want to make sure that content creators are compensated for their work. Ad firms that employ fingerprinting stand between me and the content creator. That said, I'm not going to pay $5/month for every blog that I occasionally read. The ad based model provides a more streamlined approach to compensation, but at the unacceptable price of privacy. I'm not quite sure what the answer is.

jwr

> content creators are compensated for their work

I have a gut feeling that we've been tricked (by ad companies) into thinking that this is somehow realistic and that casual "content creators" can get meaningful money from us reading their articles.

Realistically, while professional content creators can make a living, writing a blog post every once in a while will not provide meaningful income. Instead of trying to "monetize" everything, we would be better off with free content like on the internet of old. There are other means of making money.

It seems that the current situation means that the "content creators" earn insignificant money, while ad companies earn huge money because of scale, and we all somehow keep believing that this is necessary for content to appear.

Buttons840

You mean I shouldn't make a comfortable living off my valuable HN comments? I was about to consider this comment a good days work. Maybe if I put this comment on my own webpage it would be more valuable?

FireBeyond

> writing a blog post every once in a while will not provide meaningful income

Nor, generally, should it. Sitting down one or two Saturday afternoons a month to write a blog post shouldn't be generating the income of a FTE.

chiefalchemist

Allow me a second to play Devil’s Advocate.

What if it could? Or should (be able to produce FTE or close income)?

In that world, the amount of pointless shite - questing to “go viral” - would be reduced to near zero. That is, if the incentive were more quality, and less quantity, we’d be better off, yes?

kasabali

> I'm not quite sure what the answer is.

It's very simple, it's what they've been doing in print media for centuries: contextual advertising.

hedora

The main “problem” with contextualized advertising is that the people producing the content get a larger share of the ad spend.

Targeted ads concentrate control over the market into a few players, which can do things like acquire competitors or run them out of business with loss leaders.

With AI, the supply of ad real estate will go to infinity, so the only thing that will matter is the quality of the places the ads run.

This would be a good time to ban targeted advertising, or for the content producers to form a cartel that only purchases contextual ads.

That cartel will probably be even worse than what we have now, since it’s going to be 2-3 mega conglomerates like Disney, and they already have handed editorial control over to the White House.

Hopefully the invisible hand of capitalism will somehow fix this.

Vinnl

Print media did also include e.g. coupons with discount codes with which advertisers could learn which lead led through a sale.

Retric

Without any transactions or user tracking it’s difficult to separate ‘legitimate’ content farms from those using bot farms to boost their page views.

Print media was also trying to guarantee their audience was an actual person by charging nominal fees, the difference was how much info required to do so.

gedy

Yes seriously - I'm old enough to have enjoy reading magazines that had ads throughout them. They were fine.

I'd venture to say contextual advertising would be more effective than whatever we've been trying to squeeze out of fingerprinting etc. All this supposed "data" they are gathering feels like a scam perpetuated by ad companies about how important it is to the people who buy ads. It's not.

Even Facebook and Instagram, which pretty much should know you to a tee is completely ineffectual at advertising to me - like at all.

8bitsrule

Same here. By the time I was old enough to have an income, reading comics had already made it possible for me to -not even see any- advertising. That carried over to newspapers, magazines... all those advertisers were wasting their money.

Later on in life I got pissed at cable-TV advertisers shoved into my favorite movies every 5-10 minutes ... ruining any ambience or artistic merit in them ... so I got rid of cable TV. By the time analog TV went away, I'd got rid of my television set. No return address on an envelope? junk mail, into the garbage unopened.

Now the pollution's ruined the 'net ... it's YouTube (re-routed) and some websites (blocked). So long, boing-boing and wired and your 'native ads'. Sites demand subscription? blocked. How much longer before advertisers realize how much they're getting ripped off?

Neikius

Do you see how the discourse has been shifted here? Some of us have nothing against ads per-se. We care about tracking.

How does tracking me and invading my privacy make ads perform better? In my case it does not. As the tracked ads are usually worse as they will keep advertising me things I don't need anymore. Context based ads worked fine in the past and I don't really see why they cannot.

Also why does every web store need to show me ads? Don't they make money out of selling things? If they really have to, do they have to invade privacy? This is like walking into a physical store and them doing facial recognition, then showing you tailored ads/inventory. That feels creepy to me.

fragmede

> How does tracking me and invading my privacy make ads perform better?

If you don’t want to be tracked, you shouldn’t be, but how could it not? At a very simple level, an ad targeted towards a 50 year old woman isn’t going to be the same ad to show a 14 year old boy. Different people like different things and ads targeting you as an advertising profile are going to be better than ones that aren’t. You may not like the targeting and think it's invasive, because it is, but let's not pretend the tracking doesn't do something.

troupo

A 14-year-old is unlikely to read/look at the same cobtebt ad a 50-year old woman. That's how contextual advertisement works.

troupo

Showing ads doesn't require invasive and pervasive 24/7 surveilance.

prymitive

> I'm not going to pay $5/month for every blog that I occasionally read

Would you pay per view? Most people (me included) would probably hesitate to say yes, because we’re used to not paying for that. But what if it meant that ad based model is gone and everything you buy is cheaper because the price does not include the cost of running ads?

Terretta

> what if ... everything you buy is cheaper because the price does not include the cost of running ads?

Except in practice we see the opposite.

There's something interesting going on with companies when they want to get paid directly versus by ads: they demand 3x - 4x or more for subscriptions or pay per view versus what they make from ads.

Easiest place to see this is ad supported non-linear TV in the years you could get without ads, or with ads. You pay significantly more to not see the ads, than they make from the ads.

Perhaps this is justified because ad-free subscriptions reduce the audience size for ad buys, but when you look at the numbers watching with ads versus paying, it wouldn't seem like the "no ads" buyers make a dent in whatever pricing tier.

In the 90s when we were young and naive, we imagined a library card model, with a library fee and then you have fractions of a cent cost to read a post, and using (hand waving) technology to uncouple viewing history from payables to content creators. That, or the British TV license model, an Internet license of some kind.

It's curious to me the ad networks haven't gotten together to preemptively offer this. Arguably Brave tried, but from an adversarial (to the ad companies) stance. It would work better from the inside with a simple regulation: if you serve ads for ad-supported content, you have to participate in the library card system at CPM rates no greater than you receive for ads to skip the ads for card holders.

aidenn0

This is price discrimination. Everybody would love to charge more money to rich people and less money to poor people, since that increases the total profit.

The only companies that we directly allow to do this are schools, but having a premium version lets you approximate this.

notatoad

The PPV model has been tried a bunch of times, and it always turns out that the rate people are willing to pay per view is not a rate that is high enough to be a viable revenue source for the content owners.

it takes a lot of $0.10-$0.25 views to make up for the loss of a $5/month recurring revenue stream that might last for years.

AndrewStephens

I wrote about this exact problem last year. To anyone who disagrees, would you pay me 5 cents to click on the following link?

https://sheep.horse/2024/11/on_micropayments.html

imiric

The fact that advertising is more profitable doesn't mean that the PPV model is not viable. It could certainly be so. Every site could set their own price, or specific tiers, which users can agree to, just like they do with subscription-based content today.

The problem is skewed incentives, of course. Advertising is acceptable to most users and easy to integrate, so why should website authors go out of their way to please a minority of their users who object to it?

myaccountonhn

I would. Or alternatively I'd also pay for a Spotify style model where my monthly amount get redistributed amongst the articles I read.

FireBeyond

At the risk of pedantry, though it's still germane to this context, that's more the Tidal model than the Spotify model.

Spotify's model is more that your monthly amount gets disproportionately redistributed to the artists that bring more interest and listens to Spotify, regardless of whether you were one of those listeners. Smaller and niche artists suffer under Spotify's model.

stackghost

You're presupposing that these blogs are producing content worth paying for. The unfortunate truth is that the overwhelming majority of blogs (99.9%+) are not.

beeflet

The PPV model can at least cover the cost of bandwidth. If you are loading the page, it must be at least some value to the user, say 1/10th of a cent.

Analemma_

Then why is everyone so nostalgic for the old days of the blogosphere to return? If blogs are all worthless, then we shouldn't care that they're disappearing and/or being put behind paywalls; we haven't lost anything.

jcynix

> Would you pay per view?

Yes, but only after viewing, of else I'd pay for "editorial" or AI generated slop which would be generated like link farms pointing to Amazon etc.

And that's the chicken-and-egg problem ...

In theory that could be resolved by registering for free at reputable sites and then paying per view with micropayments. Or by a scheme where one would register and only pay when I actually did read stuff, not with the currently en-vogue monthly fee for each and every site.

echelon_musk

How do you track the views?

imiric

How do you track ad impressions?

morkalork

Hard to say, there's no shortage of enticing looking medium articles that are superficial and worthless. I would not pay per view that trash even though there are good ones buried in the pile.

Terretta

"If you thought click-bait was bad before..."

imiric

Brave Inc. gets a lot of flack, some warranted, but their Basic Attention Token allows for exactly this. Users can add credit to their wallet by either consuming privacy-friendly ads or topping it up manually, which then gets distributed to the sites they visit in the proportion they choose, transparently in the background while they browse.

It is a shame that this feature gets lumped together with claims of crypto scams, and similar nonsense. Yet this is precisely the right model that could work at scale to eliminate the advertising middleman, and make the web a safer and more enjoyable experience for everyone.

Analemma_

Brave strips out the ads that the creators put on their site, puts their own ads there, then gives the creators some of that money if and only if the creator realizes they have to sign up for Brave's cryptoshit. It's straightforwardly the kind of racket that would get your knees broken if you tried to do it to somebody in real life, but "it's ok because it's on computers". All the flak is deserved.

fragmede

It's frustrating that humans are stoichastic parrots and the minute you mention crypto they go into conniptions because the rails are basically there. It's not user friendly, but it's possible to build a system where you transfer $0.05 cents of crypto to someone as you scroll down a web page using a special browser.

beeflet

The Ad model is exactly the problem. If you had anonymous, cheap micropayments where you pay 1 cent per pageview it would not just solve the surveillance problem but it would solve the DDoS problem too (you set up a web server where the price increases with load and clients bid for bandwidth).

AndrewStephens

Sadly, I think you are wrong. Micropayments seem attractive but the idea falls apart quickly - there are just too many intractable non-technical problems. It has been tried more than once and each effort has failed.

I wrote a longer post on this[0] but to save you the click I will state the biggest problem from a privacy point of view - if you think privacy is bad now with ads imagine how much worse it would be with a payment processor knowing your every click.

Yes, I know about certain cryptocurrencies that maintain privacy, they are a non-starter for micropayments for different reasons.

Even if a magically technical solution to privacy were to emerge there is nothing more valuable than information about paying customers and sites would use browser fingerprinting anyway.

[0] https://sheep.horse/2024/11/on_micropayments.html

beeflet

I think it is a technical problem. If you could integrate payment channels on top of private cryptocurrencies that would be enough. Even without the lightning network and just direct 1-to-1 payment channels, it would work.

The article you lists assumes a "conventional" credit card system with chargebacks, massive fees, etc. which makes micropayments ecosystem impractical in the first place. Proposals for micro-payment systems usually describe a way top enable low-fee payments.

The author doesn't take into account modern cryptocurrency tech like payment channels. I really doubt that payments have a natural fixed floor of 10s of cents - Payment providers charge these fees simply because they are in a natural monopoly position, thanks to lock-in and regulation. The need to control fraud is caused by regulatory requirements, which are in turn caused by monopolization.

Despite being technologically less efficient, even traditional cryptocurrency payments are cheaper than bank transfer fees due to competition and low regulation.

Secondly, you assume that no one wants to do micropayments. The infrastructure doesn't exist for it yet. If you don't build it, they will not come.

As for browser fingerprinting, it can be solved on the client side with enough effort. Look at tor browser. Just have a system where cookies, WebGL, etc. are opt in on a browser level in the same way that WebUSB is. Artificially limit the performance of javascript to prevent bench-marking. I think it is possible to solve this architecturally.

Check it out!

https://en.bitcoin.it/wiki/Payment_channels

https://lightning.network/lightning-network-paper.pdf

Also, there are GNU Taler/Chaumian cash type systems that inherit the efficiency of centralized systems with an added privacy benefit.

yegle

https://en.wikipedia.org/wiki/Google_Contributor was the ideal solution IMHO.

airstrike

Pay $5/month to buy credits that let you read content behind that network. Every blog you read gets $0.10. Top up with credits if you run out.

Sending emails costs $0.50.

ako

I read from too many different sources through aggregators like hackernews. With a network you'd probably still have too many subscriptions.

Also wonder if it will really work out, i open too many articles that are pretty bad when you start reading them. So i quit after 1 or 2 paragraphs.

Now if you get the first 2 paragraphs for free, contents writers will start to optimize for good first 2 paragraphs, and afterwards quality will drop. Also, many blog posts or news articles don't have more than 2 paragraphs of good content.

CamperBob2

Eh, that's too expensive unless the recipient can authorize refunds for non-spam emails.

But yes, I always thought some form of network syndication would emerge on the Web, where creators could register for their share of aggregated periodic payments made by users.

Still not sure why that's not a thing. I would pay $50/month to a syndicate in return for never having to deal with paywalls on any sites affiliated with them. But only as long as the vast majority of sites participated, and that is probably the showstopper, I guess. We'd end up paying 20 different 'syndicates' for absolutely no good reason, just as we now have to deal with 20 different streaming services.

tetha

It reminds me of a game we played with students of data classification algorithms like ID3: How many yes/no questions do we need to uniquely identify everyone in this room?

With like 12 students, that's 4 bits, and it often ends up with 2-3 questions. It starts off with the obvious ones - man/woman/diverse, but then a realization comes in: An answer usually contains more information than just that one bit. If you have long hair, you're most likely a woman and/or a metalhead for example. That part will get shaken out later on.

And those thoughts make these browser fingerprinting techniques all the more scary: They contain a lot of information and that quickly cuts the possible amount of people down. Like, I'm a Linux Firefox user with a screen on the left. I wouldn't be suprised if that put me in a 5-6 digit bucket of people already.

georgefrowny

> An answer usually contains more information than just that one bit.

That means there is less information in the question "do they have long hair?", not more. Asking "long hair?" and then "woman?" is probably, in most groups, roughly the same as just the first or second question alone. So the second question added much less than one bit of information because the answer is probably "yes". "Long hair" and then "metalhead" is the same, except that the answer to the second question is probably "no".

Yes/no questions on average contain the most information each when they partition the remaining possibilities 50:50. Then each answer gives you exactly one more bit. The closet you get to either a 100:0 or 0:100 yes:no split, the smaller the fraction of a bit you encode in the answer.

"Metalhead?" usually gives you lots of bits of information (probably 4 in an "average" group of 16 containing at least one metalhead) if the answer is "yes", but on average that's outweighed by the very high chance that the answer will be "no". If there are no metalheads or only metalheads, it gives you zero information.

tetha

Ah, I flipped it in my head. That happens after 10 years.

In this case, it was often an interesting exercise in bias as well. "Woman?" would usually single out 1-2 persons out of the 15, so it was a terrible question. It was CompSci after all. "Long hair?", lumping women and metal heads into one group would often split it into half and half. That was much better, and then spurred creative thoughts like travel distance, or bus stations.

mathgradthrow

>An answer usually contains more information than just that one bit.

Isn't the point to ask yes or no questions?

zie

Yes, but you can make assumptions based on what you know about humans generally. Like their example that if you ask if you have long hair. If you answer yes the likelihood is you are probably female.

You can think of all sorts of questions and answers like this, and when you combine with the assumptions and answers from previous answers you can make even more assumptions. They won't always be correct, but you don't have to be "perfect", depending on your use-case. For example for advertising purposes assumptions(even if incorrect) can still go a long way.

There is a reason Target got sooo good at identifying pregnant women[0] before the women knew they were pregnant that they creeped out women, and had to pull back what they did with that information. This was like a decade or more ago. It's only gotten more accurate since then.

0: one example from 2012: https://techland.time.com/2012/02/17/how-target-knew-a-high-...

armchairhacker

https://medium.com/@colin.fraser/target-didnt-figure-out-a-t...

https://www.predictiveanalyticsworld.com/machinelearningtime...

codedokode

> Target got sooo good at identifying pregnant women

That's why I pay with cash and do not have a loyalty card (other customers often offer theirs at cash register anyway). And of course I don't even go to Target.

emil-lp

It's still a yes/no question, it's just that the question is "do you have long hair".

The goal of these decision trees is to have as few questions that divide the group in two balanced halves (and also recursively).

If you imagine a binary tree with questions in each internal node, and in each leaf there is a person. You want the height of the tree to be minimized.

tetha

Yes, but multiple yes or no questions in combination can easily yield more information than they should in a real dataset. That's the real educational point.

gweinberg

You seem to be confused about the difference between "less" and "more". In general a yes-no question gives less than 1 bit of information if yes and no are not equally likely. There is no way it can be expected to give more.

throw8484949

[flagged]

542458

I think a plain reading of the post you’re replying to would be “obvious as a way of segmenting people”.

Vinnl

It's obvious in the sense that most people will start out with that as their first question.

ekjhgkejhgk

The core of the problem is that we've made this behavior of "run javascript that pulls more javascript and then run that too" the default. Stallman was right, as always.

boxedemp

The older I get the more I see that RMS was right about so many things.

When I was young I used to think of him as that eccentric pedantic mit guy but now I see him as a true warrior for freedom.

codedokode

The problem is not JS, the problem is useless techonolgies like WebRTC or WebGL that can run without permission and that, I think, are used in 99% cases for figerprinting. And people who designed them and did nothing to prevent fingerprinting.

beeflet

WebGL and WebRTC are hardly useless, but they allow you to collect way too much fingerprinting data based on the way they've been designed.

binoct

Neither WebRTC or WebGL are remotely ‘useless’. Very fair though to say that you would prefer to have them disabled and/or whitelisted for certain sites.

gruez

>The core of the problem is that we've made this behavior of "run javascript that pulls more javascript and then run that too" the default. Stallman was right, as always.

It really isn't, because there's plenty of fingerprinting scripts that run on the same domain, especially fingerprinters from security providers like cloudflare or akamai.

binaryturtle

A browser basically is like a really dumb trojan, pulling a whole herd of wooden horses into the city.

IshKebab

Does he have a strong stance of JS in the browser? In any case, I don't think many people would agree that the dubious extra privacy you gain from blocking that is really worth breaking half the web. Fingerprinting is not too hard even without JS.

StillBored

I would re-frame "is it really worth breaking half the web" as those sites are not compliant to begin with. Nothing in the web standards stack mandates javascript, its an optional feature! Web developers of yore understood that a fundamental property of a properly written web site was to degrade gracefully if javascript wasn't available, but the groupthink of the past decade has chosen weaponized incompetence over doing their jobs and in the process has not only thrown a load of noncompliant insecure garbage out there, but broken a load of accessibility standards, and other things in the process.

ekjhgkejhgk

> Does he have a strong stance of JS in the browser?

Lets see what he says on the subject.

https://www.gnu.org/philosophy/javascript-trap.html

IshKebab

Ok so his issue is even more obtuse - he doesn't care about fingerprinting; he cares that not all JS code is GPL.

bee_rider

Blocking most JavaScript is fine, it mostly just breaks the silly pointless over-designed sites anyway. Just like everything else, most of the internet is garbage; blocking over-designed JavaScript sites isn’t a perfect filter but it is an ok first heuristic.

delusional

His stance is pretty simple. The JS on most pages is proprietary, and he doesn't like proprietary software.

baq

The real problem: if you can’t be identified, the system assumes you’re a bot, untrustworthy, or both and instead of reading content you get to select squares with buses and traffic lights ad infinitum.

pphysch

Yes, and the conspicuous lack of signal is itself a signal.

"Get me all the individuals in this geo area that have atypical communication patterns..."

Terr_

https://xkcd.com/1105/

gweinberg

For a fingerprint to be useful it must not only be unique but also persistent. If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?

internetter

Point still taken, however you can only really check if a given font is installed, not obtain a list of all fonts. Thus, installing a wacky font is pointless as the fingerprinter won’t bother to check that particular font. There is queryLocalFonts on chrome but this requires a permission popup.

poorman

It's likely that yes, you will end up with an alias that links you because of a cookie somewhere, or a finger print of the elliptic curve when do do a SSL handshake, or any number of other ways.

The ironic thing is that because of GDPR and CCPA, ad tech companies got really good at "anonymizing" your data. So even if you were to somehow not have an alias linking your various anonymous profiles, you will still end up quickly bucketed into a persona (and multiple audiences) that resemble you quite well. And it's not multiple days of data we're talking about (although it could be), it's minutes and in the case of contextual multi-armed bandits, your persona is likely updates "within" a single page load and you are targeted in ~5ms within the request/response lifecycle of that page load.

The good news is that most data platforms don't keep data around for more than 90 days because then they are automatically compliant with "right to be forgotten" without having to service requests for removal of personal data.

gruez

>If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?

See: https://xkcd.com/1105/

Services with a large enough fingerprinting database can filter out implausible values and flag you as faking your fingerprint, which is itself fingerprintable.

NewsaHackO

But they still wouldn't be able to confidently connect his different fingerprints to the same individual, just that he is one of a group of individuals who fake their fingerprints.

gruez

It would depend on what your existing fingerprint is. If you're using some sort of rare browser/OS/hardware combination (eg. pale moon/gentoo linux/IBM thinkpad) it might be worth spoofing, but if your configuration is relatively "normie" (eg. firefox/windows/relatively recent intel or amd cpu/igpu)you're probably making yourself stick out more by faking your fingerprint.

nobody42

You could test with this: https://github.com/abrahamjuliot/creepjs Does it store the data? Unknown.

The best browser for protection is https://mullvad.net/en/browser because it makes the connection uniform, to better blend in.

thetyster

> best

I guess that really depends on how you classify "best"

Tor is pretty good for protection. Then there's always i2P as well…

Saying one browser can protect the best is pretty hard to prove.

nobody42

Best among existing. Anti-fingerprinting field is still in it's early stages.

I wouldn't say Tor Browser is the best because it requires custom configuration to be usable conveniently, which will make the connection non-uniform (and the user will stand out).

>Tor is pretty good for protection. Then there's always i2P as well…

Tor and i2P does nothing for (anti)fingerprinting - the program which render the web pages does.

>Saying one browser can protect the best is pretty hard to prove.

Not a proof but things to consider: https://privacytests.org/

sfink

I'd like a "Firefox + uBlock Origin" column on that page. (But then you'd have to consider filter lists enabled...)

lipbetfox

I still haven't found a method that can fingerprint simple Firefox containers. I use automatic temporary containers as a rule, and rules for specific sites where I want to keep persistent sessions.

I don't understand how temporary containers are still not a built-in Firefox feature, it seems like such a no-brainer solution for privacy.

boxedemp

Open question,

If you're on a VPN and using Firefox containers, is the only way to identify me to look at my mouse movement and correlate it?

mixmastamyk

Isn't the semi-recent per-site cookie jar most of this functionality?

rob_c

How to scream I'm behaving badly online...

stego-tech

Sandboxing in containers and manually exempting specific security tokens is arguably one of the better steps we can take in the immediate term, as are random agent strings and returning fake data for common prompts. Of course that only works in the immediate, because this, like advertising in general, is an arms race at the moment.

This feels like a regulatory question, not a technical one. We've repeatedly proven that with math and code alone, we can fingerprint and identify almost every unique person on the planet, given enough data points. The long-term solution seems like it should be severe consequences for data breaches (as in, corporation-destroying penalties for disclosure of PII, including fingerprint data) such that everyone only collects the data they need to provide the service in question and not a single bit more, deleting it as soon as it's no longer necessary. Right now there's no consequence if Google or Meta disclose huge swaths of user data, and thus no disincentive to collecting as much as they possibly can.

Punish the leaking of data, and suddenly you've raised it's cost to the point that casual players will nope out entirely. From there, it's the eternal back and forth of governments waffling between business and electorate interests.

gruez

>We've repeatedly proven that with math and code alone, we can fingerprint and identify almost every unique person on the planet, given enough data points.

I'm very skeptical of this claim, especially in practice. Contrary to what many fingerprinting sites claim ("you're unique of everyone we fingerprinted!!"), browser fingerprinting can't possibly uniquely identify someone. Smartphones are pretty locked down and there's very few customization options that allow for fingerprinting. In the US Apple has around 50% market share in the US, and there are 30 iPhones models that are still in support. That means if you're an iPhone user in a city of 1 million, there are, on average, approximately 16.6k (500k / 30) other people with the same exact model of iPhone (and therefore fingerprint) as you. As long as you don't do anything to stick out (eg. living in the US but setting Denmark as your locale), you'll be reasonably anonymous.