The Halting Problem is a terrible example of NP-Harder

123 comments

·April 17, 2025

bob1029

> Most "definitely harder than NP" problems require a nontrivial background in theoretical computer science or mathematics to understand.

One of the simplest examples of this is automatic program synthesis.

Searching for the optimal (e.g., shortest) program that satisfies a given set of inputs/outputs (function) is an undecidable (worse than exponential time) problem similar to the halting problem. The exponent in this case would have been the size of the instruction set, but we still don't know how many cycles we need (i.e., is it even practical to run?).

In applications like genetic programming, we deal with the halting problem by specifying a maximum # of cycles that the program interpreter will run for each time. The ability of the synthesized programs to halt in a bounded time becomes part of selection pressure. Candidates that return poor or no results within the constraints are quickly eliminated. This can be further compounded by making the candidates evolve their own resource limits as part of their genome.

Put differently, we can approximately solve the halting problem for some smaller problems with clever search techniques. I don't think talking about these things in absolute terms is very useful. Everything depends on the problem and the customer's expectations.

pron

I don't know if I would call that "approximately solving the halting problem", and the use of the halting problem is already short-hand for much more general and less binary results. In the 1965 [1] paper by Hartmanis and Stearns that gave birth to computational complexity theory, they generalise the halting theorem to effectively say that it's generally impossible to tell what a program would do (specifically, whether it halts) in it's Nth step without simulating it for N steps. The halting theorem is just the special case where N is any N.

What you're doing isn't approximating or getting around anything. You're simply following the "generalised halting theorem": you're interested in what the program does in N steps, and you're finding that out by simulating it for N steps. You're not making any approximation that shortcuts any computational complexity results. (Such approximations exist in some cases, but that's not what you're doing here)

[1]: On the computational complexity of algorithms: https://www.ams.org/journals/tran/1965-117-00/S0002-9947-196...

bob1029

> You're not making any approximation that shortcuts any computational complexity results.

I completely agree in the default/initial case.

> (Such approximations exist in some cases, but that's not what you're doing here)

However, the entire point of genetic programming is that once you find something that is "on the map" at all, you can begin to exploit the region around that program with some reasonably accurate expectations regarding how certain behaviors will unfold in the local space. The larger the population the more stable this tends to become on average.

So, for pure exploration in GP, you do indeed suffer the full weight of the halting theorem. As you transition into the exploitation regime, this becomes more of a grey area.

pron

> So, for pure exploration in GP, you do indeed suffer the full weight of the halting theorem. As you transition into the exploitation regime, this becomes more of a grey area.

I don't understand what the grey area is. The time hierarchy theorem (that we can consider the "generalised halting theorem") is a theorem; it is absolute. What we have here isn't something that uses some approximation that is true with some probability that we could maybe call a "grey area", but something that bears the full brunt of the theorem.

That there are subsets of problems in some complexity class X that can be solved faster than X's upper limit is the very point of the complexity hierarchy. Yes, there are a subset of problems in EXPTIME that can be solved in polynomial time, hence the existence of the P class, but that doesn't mean that EXPTIME is a "grey area". If you're solving some problem quickly, then we know that what you've solved was not one of the hard problems to begin with. If you're solving some problems in polynomial time, then those problems are in P.

For example, heuristic SAT solvers solve many SAT problems quickly. But their authors don't consider that evidence that P = NP, and understand that the instances that their solvers solve are "easy" (which is not to say they're not useful). In other words, what they're saying is that many useful SAT problems are easy, not that they're able to solve the hard problems efficiently.

frontfor

> I don't think talking about these things in absolute terms is very useful. Everything depends on the problem and the customer's expectations.

Agreed. In the real world, approximate solutions to hard problems are often good enough.

For instance, the TSP might be a hard problem, but solving it exactly is pointless since the distances between nodes might be subject to measurement uncertainty anyway, and actual travel times might not be fixed as they might fall under a distribution.

Having said that, it’s still worth studying the theoretical aspects of these problems.

andrewla

The example given doesn't seem right to me.

> There is one problem, though, that I find easily explainable. Place a token at the bottom left corner of a grid that extends infinitely up and right, call that point (0, 0). You're given list of valid displacement moves for the token, like (+1, +0), (-20, +13), (-5, -6), etc, and a target point like (700, 1). You may make any sequence of moves in any order, as long as no move ever puts the token off the grid. Does any sequence of moves bring you to the target?

If someone gives you such a sequence, it seems trivial to verify it in linear time. Even for arbitrary dimensions, and such witness can be verified in linear time.

hwayne

To be in NP, witness must be verifiable in polynomial time with respect to the size of the original input. In this problem (VAS Reachability), the solution can be `2^2^2^...^K` steps long. Even if that's linear with respect to the witness, it's not polynomial with respect to the set of moves + goal.

andrewla

Hmm.. I'd love to see a more formal statement of this, because it feels unintuitive.

Notably the question "given a number as input, output as many 1's as that number" is exponential in the input size. Is this problem therefore also strictly NP-hard?

hwayne

> Hmm.. I'd love to see a more formal statement of this, because it feels unintuitive.

The problem is called "Vector Addition System Reachability", and the proof that it's Ackermann-complete is here: https://arxiv.org/pdf/2104.13866 (It's actually for "Vector Addition Systems with states, but the two are equivalent formalisms. They're also equivalent to Petri nets, which is what got me interested in this problem in the first place!)

> Notably the question "given a number as input, output as many 1's as that number" is exponential in the input size. Is this problem therefore also strictly NP-hard?

(Speaking off the cuff, so there's probably a mistake or two here, computability is hard and subtle!)

Compression features in a lot of NP-hard problems! For example, it's possible to represent some graphs as "boolean circuits", eg a function `f(a, b)` that's true iff nodes `a,b` have an edge between them. This can use exponentially less space, which means a normal NP-complete problem on the graph becomes NEXPTIME-complete if the graph is encoded as a circuit.

IMO this is cheating, which is why I don't like it as an example.

"Given K as input, output K 1's" is not a decision problem because it's not a boolean "yes/no". "Given K as input, does it output K ones" is a decision problem. But IIRC then for `K=3` your input is `(3, 111)` so it's still linear on the input size. I think.

throwaway81523

In the definition of NP, you get the input in unary, so that gets rid of that exponential. The issue with VAS is that the number of moves required could be extremely large.

There's a similar situation in chess (at least if generalized to N by N). You could assert Black has a forced win from a given position, but verifying the claim requires checking the entire game tree, rather than a succinct certificate. Generalized chess is in fact PSPACE-complete, which is generally believed to be outside of NP.

johntb86

It needs to be a decision problem (or easily recast as a decision problem). "given a number as input, output as many 1's as that number" doesn't have a yes or no answer. You could try to ask a related question like "given a number as input and a list of 1s, are there as many 1s as the number?" but that has a very large input.

null

[deleted]

trixthethird

Linear time in the length of the sequence, yes. But is the sequence length linear in dimension size, or number of moves given? Thats what is interesting.

andrewla

Linear in the size of the witness, however many bits it takes to express it.

trixthethird

This applies to any computable problem though, no? At minimum the verifier has to read the witness. If we ignore PCPs and such. The point here is that the witness grows very fast in terms of vector dimensionality and/or move set size.

bjornsing

It doesn’t need to be linear though. Polynomial is enough.

But I guess it can be proven that the shortest possible sequence grows faster than polynomial.

hwayne

Proving it's nonpolynomial is pretty easy: just make the goal vector `(10, 10, 10)` and the displacement vectors `{(1, 0, 0), (-10, 1, 0), (-10, -10, 1)}`. It takes ~1000 steps to reach the goal, and if we add one more dimension we need 10x more steps.

So it grows, at least, exponentially. Back in 1970ish someone found a problem that's at least double-exponential, proving it was 2-EXPTIME-hard. It was conjectured to be 2-EXPTIME-complete for the longest time, but turned out to be significantly harder.

trixthethird

I think they proved it grows with Ackermann function.

Choco31415

A sequence is easy to verify. Choosing the sequence not so much.

Roughly put that is the certificate definition of being in NP.

andrewla

The goal here was to show that it was strictly NP-hard, i.e. harder than any problem in NP.

Nevermark

Harder to solve, not necessarily harder to verify?

If I am understanding things right.

quickthrower27

[dead]

thaumasiotes

I agree with the issue ("What's bigger than a dog? THE MOON"), but I don't agree with the need to provide an example that is provably distinct from NP. We're fine providing NP-complete problems without proving that they're distinct from P.

There are lots of accessible problems that belong to spaces that probably aren't NP. One that should be familiar to most people studying CS theory is "do these two regular expressions match the same set of strings?".

---

For the Diophantine vector reachability problem... I don't really like the Quanta presentation. ("An easy-sounding problem yields numbers too big for our universe.")

First, the problem, if you describe it accurately, doesn't sound easy at all. Diophantine problems are never easy. That's why we have real numbers.

Second, the article suggests that the problem should be introduced to children by casting in in terms of several rules of exchange ("For Alice, the transition vector (−1, −1, 1, 0) would represent the exchange of an apple and a banana for a cantaloupe."). But that would make the problem trivial: you start at the origin; without a rule of "exchange" in which the other party gives you as much as you want of something for free, you're never going to leave it.

And third, many easy problems generate numbers too big for our universe. That's not unusual at all. Compare the question "how many digits of pi do you need before the potential error in calculating the volume of a sphere the size of the visible universe is less than the volume of one hydrogen atom?". Can you imagine using more digits than that? You just involved a number too big for the universe.

The article passes over this point itself:

> It’s physically impossible to write down all the digits of 2↑↑6 — a liability of living in such a small universe.

mhink

> Second, the article suggests that the problem should be introduced to children by casting in in terms of several rules of exchange ("For Alice, the transition vector (−1, −1, 1, 0) would represent the exchange of an apple and a banana for a cantaloupe."). But that would make the problem trivial: you start at the origin; without a rule of "exchange" in which the other party gives you as much as you want of something for free, you're never going to leave it.

I'm not sure what point you're trying to make here? That specific transition vector would be one such vector among several in a given ruleset. You're correct that for any ruleset, if I don't have at least one vector that can get me away from the origin, I'm stuck there, but then I also have to consider that I still might not be able to get to where I want to go. If I have the ruleset [(2, 0), (-4, 2)] I still can't get to (3, 5).

Leszek

Something I find fascinating is that we know that P != EXPTIME, and that P <= NP <= EXPTIME, but have managed to prove neither P != NP nor NP != EXPTIME. NP has to be somewhere between them but we have no idea where.

macleginn

"NP has to be somewhere between them but we have no idea where" – I guess that this state of affairs won't change much even if we prove P != NP?

dgs_sgd

I think that's unclear. A constructive proof of P != NP might yield insights into the "gap" between them.

the-grump

Technically correct is the best kind of correct, and The Halting Problem is a fun head scratcher for a green computer scientist. I'm glad it was part of my introduction to NP hardness.

That said, you do make a great point OP, and I'll think about it every time the Halting Problem comes up.

edanm

This seems weird to me. I would think the classic way to introduce these topics is to first talk about decidability, introduce the halting problem through that, then introduce complexity. That's certainly how I was introduced to this material.

I wouldn't think that you'd learn about the halting problem as only an example of something harder than NP hard.

sfink

I agree. NP, NP-complete, and NP-hard are all talking about how long things take. If something is uncomputable, then how long it takes doesn't seem very relevant or interesting.

"If Jimmy earns 5 cents a day and a watermelon costs $10, how long will it take for Jimmy to earn enough money to buy the color blue?"

Maybe the connection is that it feels like the halting problem could be solved if you had an infinite amount of time?

thaumasiotes

The halting problem obviously can be solved if you have an infinite amount of time, as long as terminating is guaranteed to take place within a finite amount of time. If it's possible to terminate after an infinite amount of time, you'll need a more infinite amount of time to check for halting.

qsort

The million dollar question here is... a literal million dollar question.

The natural way to go up in complexity from NP is along the polynomial hierarchy, but since it could be the case that P=NP, those problems aren't provably harder.For all we know it could even be the case that P=PSPACE.

You could invoke the nondeterministic variant of the time-hierarchy theorem.

null

[deleted]

ogogmad

Is the problem mentioned in the article equivalent to deciding whether there exists a solution to a system of linear equations in the positive integers?

I think no, because the vector additions are considered in sequence, and at no point are you allowed to leave the positive quadrant.

[Edit] Yeah, it's more than just solving positive integer linear systems: https://en.m.wikipedia.org/wiki/Vector_addition_system

hwayne

IIRC without that restriction the problem is "just" NP-complete.

chad1n

To be honest, checking if there is a path between two nodes is a better example of NP-Hard, because it's obvious why you can't verify a solution in polynomial time. Sure the problem isn't decidable, but it's hard to give problems are decidable and explain why the proof can't be verified in P time. Only problems that involve playing optimally a game (with more than one player) that can have cycles come to mind. These are the "easiest" to grasp.

nmilo

Isn't this NP-complete? The "solution" here would be the steps to take in the path which can be found by brute-force

Wikipedia:

> 2. When the answer is "yes", this can be demonstrated through the existence of a short (polynomial length) solution.

> 3. The correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying all possible solutions.

waynecochran

Halting problem is not even decidable. To mention it in the context of NP is a categorical fouxpas.

JohnKemeny

HALT is a problem that belongs to the complexity class RE (recursively enumerable), which is a class that contains the complexity class NP.

NP: Problems with efficient verifiers — given a candidate solution, you can check correctness in polynomial time.

RE: Problems with semi-algorithms — you can recognize “yes” instances by eventually halting, but you may run forever on “no” instances.

I wouldn't consider it a categorical error.

johanvts

> NP-hard is the set all problems at least as hard as NP-complete

Im a bit confused by this. I thought NP-complete was the intersection of NP-hard and NP. Maybe this is stating the same?

Khoth

NP-complete is indeed the intersection of NP-hard and NP.

If you can solve any NP-hard problem then you can solve any NP-complete problem (because you can convert any instance of an NP-complete problem to an NP-hard problem), so NP-hard problems are "at least as hard" as NP-complete problems.

(But an NP-hard problem might not be in NP, ie given a purported solution to an NP-hard problem you might not be able to verify it in polynomial time)

redleader55

This [1] diagram from Wikipedia represents you are saying in a visual way. NP-hard and NP-complete are defined as basically an equivalence class modulo an algorithm in P which transforms from one problem to the other. In more human terms both NP-hard and NP-complete are reducible with a polynomial algorithm to another problem from their respective class, making them at least as hard.

The difference between the two classes, again in human terms, is that NP-complete problems are decision problems "Does X have property Y?", while NP-hard problems can include more types - search problems, optimization, etc, making the class NP-hard strictly larger than NP-complete in set terms.

The statement in the article means that NP-hard problems require solving a NP-complete problem plus a P problem - hence being at least as hard.

[1] https://en.m.wikipedia.org/wiki/File:P_np_np-complete_np-har...

edanm

I don't believe this is accurate.

The difference between NP-Hard and NP-Complete is that NP-Complete problems are both NP-Hard, and in NP. As opposed to NP-Hard problems that are not NP-Complete, meaning they are not themselves in NP.

zahlman

If NP-hard isn't even a subset of NP, why is it called "NP-hard"?

suzumer

Because a language being NP-hard implies it is at least as hard as the hardest NP problems. For any language that is NP-hard, if one had a turing machine that decided the language, one could construct a polynomial time transformation of any language in NP to the NP-hard language to decide it using the NP-hard turing machine.

sfink

Membership in NP is a statement about how easy a problem is (specifically, that you can verify an answer in no worse than polynomial time). NP-hard is about how hard a problem is (at a minimum).

I wonder if there would be less confusion over this if NP had been called "NP-easy" in the first place. But Wikipedia says that by now that term has already been taken.

P is "very easy": you can solve these in polynomial time. NP is "easy": you can verify a solution in polynomial time. Are there any problems you can verify but not solve in polynomial time? Nobody knows! (But ~everyone thinks so.)

NP-hard is "hard": it takes polynomial time or more to even verify these.

Sharlin

Because naming things is one of the two difficult problems in computer science.

ykonstant

I am confused about the precise statement of the problem that is claimed to be provably NP-hard, decidable and not in NP. Any clarification?

ngkabra

The article talks about a 2-dimensional grid which starts at (0,0) (bottom right) and extends infinitely to the right and the top, so all (x,y) for non-negative integers x,y exist. But x or y negative does not exist. Given a list of possible jumps e.g. (+1,+10) or (-20,+13), and a target destination, e.g. (700,1). Does there exist a series of valid jumps that takes you from (0,0) to (700,1) without ever going off grid (i.e. into negative territory)?

This problem might or might not be NP-Harder. However, now extend the problem to higher dimensional grids. At some number of dimensions, the problem becomes definitely NP-Harder (i.e. NP-hard, decidable, but not in NP)

ykonstant

Which number of dimensions, though? Is there an effective bound? Otherwise it is hard to suggest the problem as an example.

thaumasiotes

The number of dimensions is one of the variables in the problem.

penteract

Given a dimension, n; a collection of n-dimensional vectors of integers v_1,..,v_k; and a target n-dimensional vector u: Is there a sequence of vectors from the collection which sums to u, such that no partial sum(of an initial segment of the sequence) has any negative components?

HN

The Halting Problem is a terrible example of NP-Harder

The Halting Problem is a terrible example of NP-Harder