· 2 min read ·

The Two-Call Pattern: Reliable UTF-16 to UTF-8 Conversion in Windows C++

Source: isocpp

Windows has used UTF-16 as its native Unicode encoding since the NT days. Every wide-character API, every wchar_t, every std::wstring reflects that choice. The rest of the world converged on UTF-8. The result is a conversion problem that shows up constantly in Windows C++ code, and the Win32 API provides two functions to solve it: WideCharToMultiByte and MultiByteToWideChar.

Giovanni Dicanio published a detailed walkthrough of this conversion on ISOCpp in December 2025. It is worth reading even if you have done this conversion before, because the details matter in ways that are easy to miss on first contact with these APIs.

The Two-Call Pattern

Both WideCharToMultiByte and MultiByteToWideChar follow the same convention: call them once with a null output buffer to get the required size, allocate that buffer, then call again to perform the actual conversion. This is standard Win32 behavior for functions that return variable-length data.

// UTF-16 to UTF-8
int size = WideCharToMultiByte(
    CP_UTF8, 0,
    wstr.data(), static_cast<int>(wstr.size()),
    nullptr, 0,
    nullptr, nullptr
);

std::string result(size, '\0');
WideCharToMultiByte(
    CP_UTF8, 0,
    wstr.data(), static_cast<int>(wstr.size()),
    result.data(), size,
    nullptr, nullptr
);

Passing the string length explicitly rather than -1 (the null-terminated sentinel) is worth doing. It handles strings with embedded nulls correctly and makes the intent explicit.

Error Handling

Both functions return 0 on failure and set the error code via GetLastError. The failure cases to watch for:

  • ERROR_INSUFFICIENT_BUFFER when the output buffer is too small
  • ERROR_INVALID_PARAMETER for bad flags or unexpected null pointers
  • ERROR_NO_UNICODE_TRANSLATION when WC_ERR_INVALID_CHARS is set and the input contains invalid sequences

That last one requires passing WC_ERR_INVALID_CHARS explicitly to WideCharToMultiByte. Without it, the function silently replaces invalid characters with a default substitution character. For code that sends text over a network or writes to a cross-platform format, silent character substitution is worse than a visible error.

The Size Cast Problem

The API takes int for string lengths, not size_t. That means every call site involves a cast from the natural unsigned type to a signed 32-bit integer. If the input string is large enough to overflow int, the conversion produces the wrong result or fails in a way that may not be obvious.

Adding a range check before the cast is the correct approach:

if (wstr.size() > static_cast<size_t>(std::numeric_limits<int>::max())) {
    // handle error
}
int len = static_cast<int>(wstr.size());

Wrapping It

The raw API is verbose. Most codebases wrap these calls into utility functions that return std::optional<std::string> or throw on failure, hiding the two-call pattern and the cast. The wrapper handles the repetition once; every call site stays readable.

This is not a glamorous problem, but it comes up in any Windows C++ project that touches the network, reads JSON, or interoperates with cross-platform libraries. Getting the conversion right once and encapsulating it means not repeating the same mistakes across a codebase.

Was this interesting?