What does "be representable in execution character set" mean?

  c++, char, language-lawyer

The type of a character literal is specified by the following rule:

A character literal that does not begin with u8, u, U, or L is an ordinary character literal. An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

So consider the below example

#include <iostream>
int main(){
    auto c = 'u0080';
    std::cout<< typeid(c).name();
}

The type of c is int (reported by GCC). Why is the type of c is int?

According to the grammar of c-char, it’s defined as:

c-char:

  • any member of the source character set except the single-quote ‘, backslash , or new-line character
  • escape-sequence
  • universal-character-name

In this example, u0080 is a universal-character-name which is a single c-char. So the ordinary character literal 'u0080' does not contain more than one c-char. The default execution character set of GCC is UTF-8. That means, u0080 is completely representable by the UTF-8 set. Why does GCC specify the type of c to be int? Although I know such a code point value cannot be represented by a char object, it’s not what the above rule states. Is it a GCC bug or something I’m misunderstanding? How to interpret "be representable in the execution character set"?

Source: Windows Questions C++

LEAVE A COMMENT