Why is there such a massive difference in compile time between consteval/constexpr and template metafunctions?

  ackermann, c++, c++20, consteval, templates

I was curious how far I could push gcc as far as compile-time evaluation is concerned, so I made it compute the Ackermann function, specifically with input values of 4 and 1 (anything higher than that is impractical):

consteval unsigned int A(unsigned int x, unsigned int y)
    if(x == 0)
        return y+1;
    else if(y == 0)
        return A(x-1, 1);
        return A(x-1, A(x, y-1));

unsigned int result = A(4, 1);

(I think the recursion depth is bounded at ~16K but just to be safe I compiled this with -std=c++20 -fconstexpr-depth=100000 -fconstexpr-ops-limit=12800000000)

Not surprisingly, this takes up an obscene amount of stack space (in fact, it causes the compiler to crash if run with the default process stack size of 8mb) and takes several minutes to compute. However, it does eventually get there so evidently the compiler could handle it.

After that I decided to try implementing the Ackermann function using templates, with metafunctions and partial specialization pattern matching. Amazingly, the following implementation only takes a few seconds to evaluate:

template<unsigned int x, unsigned int y>
struct A {
    static constexpr unsigned int value = A<x-1, A<x, y-1>::value>::value;

template<unsigned int y>
struct A<0, y> {
    static constexpr unsigned int value = y+1;

template<unsigned int x>
struct A<x, 0> {
  static constexpr unsigned int value = A<x-1, 1>::value;

unsigned int result = A<4,1>::value;

(compile with -ftemplate-depth=17000)

Why is there such a dramatic difference in evaluation time? Aren’t these essentially equivalent? I guess I can understand the consteval solution requiring slightly more memory and evaluation time because semantically it consists of a bunch of function calls, but that doesn’t explain why this exact same (non-consteval) function computed at runtime only takes slightly longer than the metafunction version (compiled without optimizations).

Why is consteval so slow? I’m almost tempted to conclude that it’s being evaluated by a GIMPLE interpreter or something like that. Also, how can the metafunction version be so fast? It’s actually not much slower than optimized machine-code.

Source: Windows Questions C++