How to Explain The Halting Problem

    22 Feb, 2024

    How to explains the Halting Problem's unsolvability in computing, referencing Alan Turing's work. Let’s start with a simple Java login function and a example.

    I came across a question on Zhihu, which goes like this: The person wrote a piece of code to control login, and the logic of the program is roughly as follows in Java:

    public boolean handleLogin(String username, String password) {
        if (userDatabase.containsKey(username)) {
            if (userDatabase.get(username).equals(password)) {
                System.out.println("Success! Welcome " + username + " to login");
                return true;
            } else {
                System.out.println("Password Error. Retry it later.");
                return false;
            }
        } else {
            System.out.println("User not found.");
            return false;
        }
        return false; // Why is this 'return false' necessary here?
    }
    

    The question is: Why is there a return statement at the end, which is the last line of the code block, even though all logical branches of the program have already returned results, covering all possible cases? Isn't this final return statement redundant?

    This raises a famous problem in the field of computation: the Halting Problem. The Halting Problem is formulated as: Does there exist a program PP that, given any input program ww, can determine whether ww will terminate within a finite time or enter an infinite loop?

    This concept may seem abstract, but let's explain it with a real-life example: Suppose you're a student, and your math teacher is very strict, often assigning complex math homework.

    One day, your teacher gives you an assignment with a strange requirement: you can only start working on it if you can prove that you can complete the math homework within a finite time, or prove that the math problem is unsolvable.

    However, the problem is difficult, and you're unsure if it's beyond your capabilities, pushing you to your limits. You're stuck trying over and over again on your draft paper, not knowing when you'll find a solution. It could be a few hours, a whole night, a year, two years, or even a decade. Worse yet, you're not even sure if the problem has a solution. Many of us should be familiar with such experiences from our school days when grappling with math problems.

    Alan Turing was the first to provide an unprovable proof. He concluded that the Halting Problem is unsolvable. In other words, you can never write a program PP that, given any input program ww, can determine whether ww will terminate within a finite time or enter an infinite loop. Just like in the example above, you can never take on that math assignment; you just have to give up.

    Next, we give a formal deduction for this problem:

    Suppose there exists a function HH, which takes two parameters: a program pp and input ii, and can determine whether program pp terminates (i.e., halts) when given input ii. H(p,i)H(p, i) returns true if program pp halts on input ii, otherwise false.

    Then, we can construct a new function DD, which takes a program qq as its parameter, and defines it as follows:

    D(q) = {
        if H(q, q) returns false:
            halt
        else:
            loop forever
    }
    

    Now, let's consider the function D(D)D(D), which takes itself as input. According to the definition of DD:

    • If H(D,D)H(D, D) returns false, then according to the definition of DD, D(D)D(D) will halt.
    • If H(D,D)H(D, D) returns true, then according to the definition of DD, D(D)D(D) will enter an infinite loop.

    Now let's observe the two possibilities for H(D,D)H(D, D):

    • If H(D,D)H(D, D) returns true, then according to the definition of DD, D(D)D(D) will enter an infinite loop.
    • If H(D,D)H(D, D) returns false, then according to the definition of DD, D(D)D(D) will halt.

    Regardless of whether H(D,D)H(D, D) returns true or false, it leads to a contradiction with the definition of D(D)D(D).

    Therefore, we arrive at a paradox: If there exists a function HH that can determine whether any program halts, then we can use HH to construct a function DD that leads to a contradiction. Thus, the assumption is untenable, meaning such a function HH does not exist, thereby proving the unsolvability of the Halting Problem.

    Based on the above conclusion, we can deduce that the capacity of computers is limited, meaning there's no universal algorithm that can solve all computational problems. This is a significant conclusion in computational theory.

    Although Alan Turing proved the Halting Problem in the context of "Turing machines," where the problem is unsolvable, this conclusion applies to any modern computing system, even beyond computers to any algorithms or programs themselves.

    Moreover, the Turing machine is an idealized computational model that assumes storage space and computing speed can be arbitrary and infinite. So, we can conclude that there's no modern computer whose performance exceeds that of a Turing machine. This indicates that computers are not omnipotent, and many problems in life cannot be solved by computers, no matter how intelligent or high-performing they are.

    This can lead to another question: Can memory leaks be fundamentally avoided in program design? The answer is also negative. Upon careful consideration, it's understandable: In a program, how can we fundamentally determine the timing of memory reclamation? Or, can we prove that an object will be released within a finite time?

    This problem is logically unsolvable. Languages like Rust provide numerous mechanisms at the language compilation level, but they can only address memory leaks caused by human errors to a small extent. They can only solve a small part of memory leak issues.

    Now, returning to the code mentioned at the beginning of the article, why is it necessary to add a return statement at the end of the function? This is related to the compiler's strategy. Because for the compiler, whether branches in the program's business code will exit the scope is unpredictable, which relates to the halting problem discussed above. Therefore, the compiler assumes that any branch may not exit the function's scope. Consequently, the compiler must explicitly use return outside of the branches to exit the function.