Skip to content(if available)orjump to list(if available)

Sublinear Time Algorithms

Sublinear Time Algorithms

29 comments

·February 23, 2025

dataflow

> There are problems for which deterministic exact sublinear time algorithms are known.

I can imagine silly examples (like "find the minimum element in this list under the assumption that no more than O(sqrt(n)) elements exceed the minimum"...), but what's an interesting example of this?

_jab

Binary search is the obvious example.

What it and your example have in common is that a significant constraint exists on the input. I can't imagine how a deterministic algorithm with unconstrained input can process that input in sublinear time, but I would love to learn otherwise.

dataflow

I can't imagine that's what they meant? The text very specifically says: "Indeed, it is hard to imagine doing much better than that, since for any nontrivial problem, it would seem that an algorithm must consider all of the input in order to make a decision." For them to be thinking of binary search, they would have to be effectively saying "it is hard to think of binary search", which would be a rather absurd position from a CS professor, especially given binary search is quite literally the first algorithm every programmer learns.

So I took it to mean there's something interesting here where the inputs could literally be anything, not heavily constrained. But I can't imagine what that might be.

kadoban

> especially given binary search is quite literally the first algorithm every programmer learns.

I get what you're saying, and it doesn't change your point, but: no _way_ is binary search the first algorithm people learn. For binary search to even be intelligible you already have to know, at a minumum linear search and the concept of sorting (you probably know a sorting algorithm first). You also learn dozens of other really simple algorithms first in practice.

FreakLegion

They say a little ways down:

> there are classical optimization problems whose values can be approximated in sublinear time

This can actually be quite useful if the approximation is guaranteed, or even if it isn't, as long as it works well in practice.

https://en.wikipedia.org/wiki/Hardness_of_approximation

dzaima

I read that as saying that binary search isn't among those "nontrivial problem"s, along with most other things with known exact deterministic sublinear time algorithms.

And your first quote is followed by "However, for most natural problems", which further indicates that the known exact algorithms are for trivial problems.

null

[deleted]

spoaceman7777

I mean, yeah, binary search is sublinear, but the data has to be ordered into a binary search tree to be able to use it, which has a much more familiar (non-sub-linear) runtime.

I have to assume the reason for the article wasn't to talk about the runtime of algorithms that operate on data that's already in a partially solved state.

qyph

https://en.wikipedia.org/wiki/AKS_primality_test though it's number theory, and concerned with numbers of size n, rather than lists of length n.

Also relevant: https://www.cs.yale.edu/homes/aspnes/pinewiki/Derandomizatio...

dataflow

> https://en.wikipedia.org/wiki/AKS_primality_test though it's number theory, and concerned with numbers of size n, rather than lists of length n.

They were talking about not reading a lot of the input, so that's not it.

minutillo

dataflow

Isn't that O(mn) worst-case run time?

quuxplusone

No, it's O(n) worst case. (The Wikipedia sidebar says "O(mn)," but that's apparently for a maimed version of the algorithm without a key part they're calling "the Galil rule." That's a special usage of the phrase "worst case"! In the absolute worst case, your implementation could have a bug and never terminate at all!)

Anyway, the point is that it's O(n/m) in the usual case. Which remains technically linear, not sub-linear; but at least it's O(n) with a constant factor smaller than 1.

gleenn

Anything probabilistic? There are so many interesting fields where you can assume the distribution of a dataset and the take a sample of data and assert things about it with a high degree of confidence. All of modern AI is built on so much of this. All the Deep Neural Nets are making grand assumptions about the shape of meaning of data, they literally assume convexity of the space and they have clearly very interesting results despite the imprecision of the model. Anything dealing with finance is also dealing in lack of data. So if you had a list of prices of a stock over time, you could probably start making assumptions exactly like that, tgat the probability that it doubles over a short time is so unlikely so you can subsample the data and have it still be super useful to make assumptions exactly, especially when you have intractably large data.

dataflow

>> deterministic exact

> Anything probabilistic?

Are you sure you're answering the same question I'm asking?

z2210558

Assuming shuffled list: estimate of mean, estimate of cardinality etc etc

munchler

I think any sort of estimation is ruled out by the word “exact”.

tbrownaw

Public opinion polling is sub-linear in the size of the population.

dataflow

> Public opinion polling is sub-linear in the size of the population.

How is public opinion polling deterministic and exact?

latency-guy2

I wouldn't call that example silly IMO.

I'd consider all the varieties of B-Tree to be real example, which goes to any DBMS. You can extend this out to any direction you want like logging for concrete examples.

GIS/Mapping/computer vision has tons of algorithms and data structures that all needed to do better than linear time as well.

Stream processing in general is another, but that ends up being probabilistic more often than not, so weak punt into that direction.

If you expand the use case out to sublinear space as well, I'd argue for compression of all kinds.

wellow

[dead]

shae

This sounds like a useful way to mix in statistics and get useful approximations. I'm reading one of the survey links and it's approximately eye opening.

dooglius

Often for problems taking integer input x, formal CS will define the input to be something like 1^x (the character '1' repeated x times, as opposed to binary) so that the time complexity is in terms of x. This class of problems seems amenable to sublinear time since only only needs log(x) steps to determine x.

doormatt

So like HyperLogLog?

qyph

Hyperloglog analyses generally assume access to the full data stream, and so are O(n) at a minimum. Perhaps by running hyperloglog on a sublinear sample of the dataset you'd get an algorithm in this class.

doormatt

That makes sense, thanks for explaining!