Keeping our free tier sustainable by preventing abuse
13 comments
·February 21, 2025gwbas1c
the_bear
This is basically what Google's reCAPTCHA v3 does: https://developers.google.com/recaptcha/docs/v3
The other versions of recaptcha show the annoying captchas, but v3 just monitors various signals and gives a score indicating the likelihood that it's a bot.
We use this to reduce spam in some parts of our app, and I think there's an opportunity to make a better version, but it'd be tough for it to be better enough that people would pay for it since Google's solution is decent and free.
AceJohnny2
Thanks for this writeup. Whenever people complain about some service removing or making it harder to try out a free tier, I think they don't realize the amount of abuse that needs to be managed by the service providers.
"Why do things suck?" Because parasites ruined it for the rest of us.
> We have to accept a certain amount of abuse. It is a far better use of our time to use it improving Geocodio for legitimate users rather than trying to squash everyone who might create a handful of accounts
Reminds me of Patrick McKenzie's "The optimal amount of fraud is non-zero" [1] (wrt banking systems)
Also, your abuse-scoring system sounds a bit like Bayesian spam filtering, where you have a bunch of signals (Disposable Email, IP from Risky Source, Rate of signup...) that you correlate, no?
[1] https://www.bitsaboutmoney.com/archive/optimal-amount-of-fra...
hn_user82179
very cool, I wasn't expecting to find this so interesting. I yesterday for the first time thought about the "abuse the free tier" actors. I was trying to use a batching job service which limited free-tier batch sizes to 5, which was so low that it took away the point from using the automated job in the first place. I think the little info box explained that they keep the limit low to prevent abuse, and I started thinking about other ways they could prevent that abuse. Your post was very topical. thanks for sharing!
EGreg
Where can we get a blocklist of those throwaway email domains?
or perhaps a really big whitelist of good ones? that would be extremely helpful!
prteja11
I get why they don't want to share their detection mechanics for potential fraudulent signups, but that is a very interesting topic to learn and discuss.
thecodemonkey
I would love do a more in-depth talk about this at some point with some more concrete examples.
oger
Great writeup. Simple heuristics very often work wonders. The fraudsters are out there and try to pinch holes in your shield. Some time ago we were running a mobile service provider and had some issues with fraudulent postpaid subscribers - however the cost of using background checking services was substantial. We solved it quite effectively by turning the background checks on when the level of fraud went over a certain threshold which made them go away for some weeks. We kept this on and off pattern for a very long time with great success as it lowered the friction to sign up significantly when turned off…
AutistiCoder
so you implemented some sort of machine learning?
thecodemonkey
Not at this time. Some simple heuristics go a long way and also makes it very easy to test and debug the logic.
skissane
I’ve seen fraud detection used in a SaaS product, and the great thing about a weighted rules approach, is professional services can understand it well enough to adjust it without help from engineering or data science, and they can explain to customers how it produced the results it did in a particular case, and the tradeoffs of adjusting the weights or thresholds, and the customers can understand it too. Whereas, a machine learning model, is much harder to understand and adjust, so issues are much more likely to be escalated back to engineering.
(This isn’t protecting the SaaS vendor against abusive signups, it is a feature of the SaaS product to help its customers detect fraud committed against themselves within the SaaS product’s scope.)
gwbas1c
I once did a machine learning project at Intel. The end result was that it was no better than simple statistics; but the statistics were easier to understand and explain.
I realized the machine learning project was a "solution in search of a problem," and left.
lupusreal
Career hack: skip the machine learning and implement the simple statistics, then call it machine learning and refuse to explain it.
Makes me wonder how easy / hard it is to turn this kind of feature into a standalone product?
IE, send email, IP, browser agent, and perhaps a few other datapoints to a service, and then get a "fraudulent" rating?