This routine puts UTF8 data into a wstring, decoding as it goes.
It assumes the encoding is correct, which is okay for my purposes
It’s not originally my code, i got it from another SO question- Mark Ransom’s wonderful answer here: UTF8 to/from wide char conversion in STL and I intend to adapt it. Anyway, the problem with this
code is it is incompatible with 8-bit processors which I am targeting.
I don’t care about the wstring return value, I’ll be replacing it with a 4 byte buffer after I make the thing work. I just need the bit twiddling part.
Specifically I need the codepage split into two hard 16-bit numbers because 8-bit C++ will not use a 32-bit int. See the comments in the code
I just don’t have the head for this conversion. It’s a little beyond me. I might be able to do it with enough time and trial and error but I was hoping to find someone that maybe has a knack for this stuff and could retool the code to use no ints larger than 16-bit.
std::wstring UTF8_to_wchar(const char * in)
{
std::wstring out;
// this code assumes 32-bit but will be 16-bit on an 8 bit processor i think.
// it needs to be split into two uint16_t values like:
// uint16_t codepointHi=0;
// uint16_t codepointLo;
// something like that
unsigned int codepoint;
while (*in != 0)
{
unsigned char ch = static_cast<unsigned char>(*in);
if (ch <= 0x7f)
codepoint = ch;
else if (ch <= 0xbf)
// this is rough for me to convert:
codepoint = (codepoint << 6) | (ch & 0x3f);
else if (ch <= 0xdf)
codepoint = ch & 0x1f;
else if (ch <= 0xef)
codepoint = ch & 0x0f;
else
codepoint = ch & 0x07;
++in;
// this is where things get particularly confusing
// and I start getting really lost:
if (((*in & 0xc0) != 0x80) && (codepoint <= 0x10ffff))
{
// yay an easy part to convert.
if (codepoint > 0xffff)
{
// i'm crying a little
out.append(1, static_cast<wchar_t>(0xd800 + (codepoint >> 10)));
out.append(1, static_cast<wchar_t>(0xdc00 + (codepoint & 0x03ff)));
}
else if (codepoint < 0xd800 || codepoint >= 0xe000) // not so bad
out.append(1, static_cast<wchar_t>(codepoint));
}
}
return out;
}
Thanks in advance!
Source: Windows Questions C++