How to open unicode file with ifstream using mingw under Windows?

Please note this is not the same questions as How to open an std::fstream (ofstream or ifstream) with a unicode filename?. That question was about an unicode filename , this one is about an unicode file contents .

I need to open a UTF-8 unicode file (containing Spanish characters) with an ifstream. Under Linux this is no problem, but under Windows it is.

bool OpenSpanishFile(string filename)
{
    ifstream spanishFile;
    #ifdef WINDOWS
    spanishFile.open(filename.c_str(),ios::binary);
    #endif

    if (!spanishFile.is_open()) return false;
    spanishFile.clear();
    spanishFile.seekg(ios::beg);
    while (spanishFile.tellg()!=-1)
    {
        string line="";
        getline(spanishFile,line);
        //do stuff
        cout << line << endl;
    }
    return true;

}

I compile it under Linux with:

i586-mingw32msvc-g++ -s -fno-rtti test.cpp test.exe

And then run it in wineconsole test.exe .

The output contains all kinds of weird characters, so it tries to open the unicode file as something different.

I have searched the internet a lot about how to open a unicode file this way, but I couldn't get it to work.

Does anyone know a solution that does work with mingw? Thank you so much in advance.


Most likely (it's unclear whether the presented code is the real code) the reason that you see garbage is that std::cout in Windows defaults to presenting its result in a non-UTF-8 console window.

To properly check whether you're reading the UTF-8 file correctly, simply collect all the input in a string, convert it from UTF-8 to UTF-16 wstring , and display that using MessageBoxW (or wide direct console output).

The following UTF-8 → UTF-16 conversion function works nicely with Visual C++ 12.0:

#include <codecvt>          // std::codecvt_utf8_utf16
#include <locale>           // std::wstring_convert
#include <string>           // std::wstring

auto wstring_from_utf8( char const* const utf8_string )
    -> std::wstring
{
    std::wstring_convert< std::codecvt_utf8_utf16< wchar_t > > converter;
    return converter.from_bytes( utf8_string );
}

Unfortunately, even though it only uses standard C++11 functionality, it fails to compile with MinGW g++ 4.8.2, but hopefully you have Visual C++ (after all it's free).


As an alternative you can code up a conversion function using the Windows API MultiByteToWideChar .

For example, the following code works nicely with g++ 4.8.2 with -D USE_WINAPI :

#undef UNICODE
#define UNICODE
#include <windows.h>
#include <shellapi.h>       // ShellAbout

#ifndef USE_WINAPI
#   include <codecvt>          // std::codecvt_utf8_utf16
#   include <locale>           // std::wstring_convert
#endif
#include <fstream>          // std::ifstream
#include <iostream>         // std::cerr, std::endl
#include <stdexcept>        // std::runtime_error, std::exception
#include <stdlib.h>         // EXIT_FAILURE
#include <string>           // std::string, std::wstring

namespace my {
    using std::ifstream;
    using std::ios;
    using std::runtime_error;
    using std::string;
    using std::wstring;

    #ifndef USE_WINAPI
        using std::codecvt_utf8_utf16;
        using std::wstring_convert;
    #endif

    auto hopefully( bool const c ) -> bool { return c; }
    auto fail( string const& s ) -> bool { throw runtime_error( s ); }

    #ifdef USE_WINAPI
        auto wstring_from_utf8( char const* const utf8_string )
            -> wstring
        {
            if( *utf8_string == '' )
            {
                return L"";
            }
            wstring result( strlen( utf8_string ), L'#' );  // More than enough.
            int const n_chars = MultiByteToWideChar(
                CP_UTF8,
                0,      // Flags, only alternative is MB_ERR_INVALID_CHARS
                utf8_string,
                -1,     // ==> The string is null-terminated.
                &result[0],
                result.size()
                );
            hopefully( n_chars > 0 )
                || fail( "MultiByteToWideChar" );
            result.resize( n_chars );
            return result;
        }
    #else
        auto wstring_from_utf8( char const* const utf8_string )
            -> wstring
        {
            wstring_convert< codecvt_utf8_utf16< wchar_t > > converter;
            return converter.from_bytes( utf8_string );
        }
    #endif

    auto text_of_file( string const& filename )
        -> string
    {
        ifstream f( filename, ios::in | ios::binary );
        hopefully( !f.fail() )
            || fail( "file open" );
        string result;
        string s;
        while( getline( f, s ) )
        {
            result += s + 'n';
        }
        return result;
    }

    void cpp_main()
    {
        string const    utf8_text   = text_of_file( "spanish.txt" );
        wstring const   wide_text   = wstring_from_utf8( utf8_text.c_str() );
        //ShellAbout( 0, L"Spanish text", wide_text.c_str(), LoadIcon( 0, IDI_INFORMATION ) );
        MessageBox(
            0,
            wide_text.c_str(),
            L"Spanish text",
            MB_ICONINFORMATION | MB_SETFOREGROUND
            );
    }
}  // namespace my

auto main()
    -> int
{
    using namespace std;
    try
    {
        my::cpp_main();
        return EXIT_SUCCESS;
    }
    catch( exception const& x )
    {
        cerr << "!" << x.what() << endl;
    }
    return EXIT_FAILURE;
}

在这里输入图像描述

链接地址: http://www.djcxy.com/p/54644.html

上一篇: 使代码(更多)跨平台

下一篇: 如何在Windows下使用mingw打开ifstream文件的unicode文件?