8 string in std::string

In response to discussion in

Cross-platform strings (and Unicode) in C++

How to deal with Unicode strings in C/C++ in a cross-platform friendly way?

I'm trying to assign a UTF-8 string to a std::string variable in Visual Studio 2010 environment

std::string msg = "महसुस";

However, when I view the string view debugger, I only see "?????" I have the file saved as Unicode (UTF-8 with Signature) and i'm using character set "use unicode character set"

"महसुस" is a nepali language and it contains 5 characters and will occupy 15 bytes. But visual studio debugger shows msg size as 5

My question is:

How do I use std::string to just store the utf-8 without needing to manipulate it ?


If you were using C++11 then this would be easy:

std::string msg = u8"महसुस";

But since you are not, you can use escape sequences and not rely on the source file's charset to manage the encoding for you, this way your code is more portable (in case you accidentally save it in a non-UTF8 format):

std::string msg = "xE0xA4xAExE0xA4xB9xE0xA4xB8xE0xA5x81xE0xA4xB8"; // "महसुस"

Otherwise, you might consider doing a conversion at runtime instead:

std::string toUtf8(const std::wstring &str)
{
    std::string ret;
    int len = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        ret.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len, NULL, NULL);
    }
    return ret;
}

std::string msg = toUtf8(L"महसुस");

If you have C++11, you can write u8"महसुस" . Otherwise, you'll have to write the actual byte sequence, using xxx for each byte in the UTF-8 sequence.

Typically, you're better off reading such text from a configuration file.


您可以在Watches窗口中编写msg.c_str(), s8以正确查看UTF-8字符串。

链接地址: http://www.djcxy.com/p/87848.html

上一篇: 根据列表中的公共值从data.frame中提取行

下一篇: 在std :: string中有8个字符串