Unicode Standards

From Code::Blocks
Revision as of 23:42, 12 April 2006 by Sethjackson (talk | contribs)

This page is meant to be a location for developers to find all the current Unicode standards, or good practices, when editing and developing the Code::Blocks IDE.

Reference


Macros

{NOTE: bullet list would look better here, but bold is used for now}
__TFILE__ = wxWidgets provide equivilant to __FILE__
__TDATE__ = wxWidgets provide equivilant to __DATE__
__TTIME__ = wxWidgets provide equivilant to __TIME__

_U() = Use it to convert non-literal char* strings to wxString. Use it for reading attributes from TiXmlNode's. If you deal with functions that return strings, you must use our _U macro.

Code:

  #ifdef wxUSE_UNICODE
    #define _U(x) wxString((x),wxConvUTF8)
    #define _UU(x,y) wxString((x),y)
  #else
    #define _U(x) (x)
    #define _UU(x,y) (x)
  #endif

i.e.: Code:

  const char* incompatible = "This is an incompatible string";
  wxString compatible = _U(incompatible);
  // wxString conftype = conf->Attribute("ConfigurationType"); // before
  wxString conftype = _U(conf->Attribute("ConfigurationType")); // after :)

_C() = multibyte C string see wxhelp (wxMBConv classes overview) Use this one for interacting with APIs needing char const*s, such as saving things to tinyXML.

Is defined in code as:

  #if wxUSE_UNICODE
     #define _UU(x,y) wxString((x),(y))
     #define _CC(x,y) (x).mb_str((y))
  #else
      #define _UU(x,y) (x)
      #define _CC(x,y) (x)
  #endif
  #define _U(x) _UU((x),wxConvUTF8)
  #define _C(x) _CC((x),wxConvUTF8)

_wxT() = fixed text's - like XRC resources object names (only adds an L before the string (ONLY if you're in a unicode build).)
wxT() is a macro which can be used with character and string literals (in other words, 'x' or "foo") to automatically convert them to Unicode in Unicode build configuration. Please see the Unicode overview for more information.

This macro is simply returns the value passed to it without changes in ASCII build. In fact, its definition is:

  #ifdef UNICODE
  #define wxT(x) L ## x
  #else // !Unicode
  #define wxT(x) x
  #endif

_T() = fixed text's - like XRC resources object names (only adds an L before the string (ONLY if you're in a unicode build).).
This macro is exactly the same as wxT and is defined in wxWidgets simply because it may be more intuitive for Windows programmers as the standard Win32 headers also define it (as well as yet another name for the same macro which is _TEXT()).

Don't confuse this macro with _()!

  wxChar _T(char ch)
  const wxChar * _T(const wxChar ch)

_() = text's which might be translated to other user-languages
This macro expands into a call to wxGetTranslation function, so it marks the message for the extraction by xgettext just as wxTRANSLATE does, but also returns the translation of the string for the current locale during execution.

Don't confuse this macro with _T()!

wxPLURAL = This macro is identical to _() but for the plural variant of wxGetTranslation.
const wxChar * wxPLURAL(const char *sing, const char *plur, size_tn)


Guidlines

char & wxChar:
Do not use wxChar when is not a text character, because a wxChar in unicode is an int of 16 bits (not 8 bits):

Example for text:

  wxChar im_a_character = _T('f');

Example for not text (not character):

  char im_a_byte = 254;

but perhaps better would be to use:

  byte im_a_byte = 254;

so it's clear that it's a byte and not a character.


Other:
Problem code:

  // indent code accordingly
  wxString code = it->second;
  code.Replace("\n", '\n' + lineIndent);

Solution: If the input is a const char*, use "normal strings". If the input is a wxChar or wxString, use the _T("macros"). For example:

  // indent code accordingly
  wxString code = it->second;
  code.Replace(_T("\n"), _T('\n') + lineIndent);


Some of the strings already converted in C::B, use _( when they should be _T(.

Example:

  WRONG: wxXmlResource::Get()->LoadDialog(this, parent, _("dlgGenericMultiSelect"));

dlgGenericMultiSelect is a reference to a resource. Therefore it must use _T instead.

  RIGHT: wxXmlResource::Get()->LoadDialog(this, parent, _T("dlgGenericMultiSelect"));

And don't forget to test for single characters, too!



All operations with wxStrings (not char*'s) should have _("string") for strings to be displayed to the user, and _T("string") for strings used internally.


Printf-like functions is - use c_str() (in examples in wxwidgets.org there are used different arguments for unicode and non-unicode versions where formating string was both "%s"). For example:

  tmpkey.Printf(_T("%s/editor/keywords/%d"), key.c_str(), i);


XRCID and XRCCTRL macros:
XRCID and XRCCTRL macros must _NOT_ be converted! They're pre-converted already!

  WRONG:   XRCCTRL(*this, _T("lblLabel"), wxStaticText)->SetLabel(label);


  RIGHT:   XRCCTRL(*this, "lblLabel", wxStaticText)->SetLabel(label);


concatenated strings:
_() is macro which calls one of wxWidget's internal function so concatenating should look like this:

  _("string 1" "string2" ... )

_T() macro simply adds 'L' before string given as a param (in Unicode of course, in normal mode it do nothing with the string) so concatenation should be:

  _T("string1") _T("string2") ...


Gotchas

_C() can return a proxy, not nessesarily a char const*

Don't write code like Code:

char const * psz = _C( str ); // formerly str.mb_str(wxConvUTF8);

_C(), in unicode mode, returns a buffer, not a raw pointer. This is a good thing because the buffer's destructor takes care of freeing the memory used by the string. This buffer is implicitly convertible to a char const* so that it can be used in things like strlen( str.mb_str() ) immediatly, but that opens up the error I'm warning about in this post.

What actually happens in the above code? wxString::mb_str() returns a buffer object. Said buffer's implicit conversion to char const* is activated, and the result is stored in psz. The temporary buffer then GOES OUT OF SCOPE and ITS DESTRUCTOR DELETES THE MEMORY. It seems that windows doesn't care, but that linux often has already reused the memory by the time psz is used again in the code.

Solution: Code:

wxWX2MBbuf psz = str.mb_str(wxConvUTF8);

wxWX2MBbuf takes ownership of the buffer ( no, it's not copied -- it's transfer of ownership semantics similar to std::auto_ptr ). That way you can actually use the memory until psz goes out of scope and deletes it.

Printf uses wxChars

When using wxString::Printf, %s wants wxChar const*, so just use .c_str(). This is important to watch out for because Printf uses varargs, which aren't typesafe, so the copmiler doesn't catch the error. If, for example, Mandrav uses .mb_str(), the compiler wont say anything because mb_str() is the same as c_str() in non-unicode mode, returning a char const*. However, when me22 runs it in Unicode mode, mb_str() returns a proxy ( see above ), which can't be passed through a vararg and the program crashes at runtime.

Streaming a plain char fails silently

This one was the source of the mysterious bug that replaced all the )'s in the class browser with 41's :P

Problem: In unicode, the following compiles fine and doesn't crash:

my_stream << '\n';

However, my_stream traffics in wxChars, and '\n' is a plain char. This means that integer promotion is applied to '\n', so the effect is as if the code were:

my_stream << static_cast<int>('\n');

Which shows a number, instead of the character.

Solution: Don't forget the _T()s! Alternatively, use "\n" instead, which the compiler doesn't silently convert, so an easy-to-stop&fix compile error will occur.