Difference between revisions of "Unicode Standards"

From Code::Blocks
(Place for the Unicode information)
 
Line 1: Line 1:
 
This page is meant to be a location for developers to find all the current Unicode standards, or good practices, when editor and developing the Code::Blocks program.  I am going to try and summarize the discussions that I was pointed to here, but I am leaving out the original author.  Sorry  Feel free to edit this to improve it, or keep it up to date.  I am new to using wiki, so please exscuse the bad designs.  This is a a '''VERY''' rough draft with no clear organizational pattern.<BR>
 
This page is meant to be a location for developers to find all the current Unicode standards, or good practices, when editor and developing the Code::Blocks program.  I am going to try and summarize the discussions that I was pointed to here, but I am leaving out the original author.  Sorry  Feel free to edit this to improve it, or keep it up to date.  I am new to using wiki, so please exscuse the bad designs.  This is a a '''VERY''' rough draft with no clear organizational pattern.<BR>
 
<nowiki>    -- Joe M.</nowiki>
 
<nowiki>    -- Joe M.</nowiki>
 +
Actually, it's quite good.  I did a quick read-through as it looks fine to me.  All I did was add some things to the bottom. ~ me22
  
 
reference: [http://www.wxwidgets.org/manuals/2.4.2/wx458.htm#unicode]
 
reference: [http://www.wxwidgets.org/manuals/2.4.2/wx458.htm#unicode]
Line 10: Line 11:
 
'''__TDATE__''' = wxWidgets provide equivilant to __DATE__<BR>
 
'''__TDATE__''' = wxWidgets provide equivilant to __DATE__<BR>
 
'''__TTIME__''' = wxWidgets provide equivilant to __TIME__<BR>
 
'''__TTIME__''' = wxWidgets provide equivilant to __TIME__<BR>
 +
 
'''_U()''' = Use it to convert non-literal char* strings to wxString. Use it for reading attributes from TiXmlNode's.  If you deal with functions that return strings, you must use our _U macro.<BR>
 
'''_U()''' = Use it to convert non-literal char* strings to wxString. Use it for reading attributes from TiXmlNode's.  If you deal with functions that return strings, you must use our _U macro.<BR>
  
Line 29: Line 31:
 
   wxString conftype = _U(conf->Attribute("ConfigurationType")); // after :)
 
   wxString conftype = _U(conf->Attribute("ConfigurationType")); // after :)
  
'''_C()''' = multibyte C string  see wxhelp (wxMBConv classes overview)<BR>
+
'''_C()''' = multibyte C string  see wxhelp (wxMBConv classes overview)
 +
Use this one for interacting with APIs needing char const*s, such as saving things to tinyXML.
 +
 
 
Is defined in code as:  
 
Is defined in code as:  
 
   #if wxUSE_UNICODE
 
   #if wxUSE_UNICODE
Line 68: Line 72:
 
'''wxPLURAL''' = This macro is identical to _() but for the plural variant of wxGetTranslation.<BR>
 
'''wxPLURAL''' = This macro is identical to _() but for the plural variant of wxGetTranslation.<BR>
 
const wxChar * wxPLURAL(const char *sing, const char *plur, size_tn)<BR>
 
const wxChar * wxPLURAL(const char *sing, const char *plur, size_tn)<BR>
 
 
 
 
 
 
  
  
Line 147: Line 145:
  
 
   _T("string1") _T("string2") ...
 
   _T("string1") _T("string2") ...
 +
 +
 +
== Gotchas ==
 +
 +
'''_C() can return a proxy, not nessesarily a char const*'''
 +
 +
Don't write code like
 +
Code:
 +
char const * psz = _C( str ); // formerly str.mb_str(wxConvUTF8);
 +
 +
_C(), in unicode mode, returns a buffer, not a raw pointer.  This is a good thing because the buffer's destructor takes care of freeing the memory used by the string.  This buffer is implicitly convertible to a char const* so that it can be used in things like strlen( str.mb_str() ) immediatly, but that opens up the error I'm warning about in this post.
 +
 +
What actually happens in the above code?  wxString::mb_str() returns a buffer object.  Said buffer's implicit conversion to char const* is activated, and the result is stored in psz.  The temporary buffer then GOES OUT OF SCOPE and ITS DESTRUCTOR DELETES THE MEMORY.  It seems that windows doesn't care, but that linux often has already reused the memory by the time psz is used again in the code.
 +
 +
Solution:
 +
Code:
 +
wxWX2MBbuf psz = str.mb_str(wxConvUTF8);
 +
wxWX2MBbuf takes ownership of the buffer ( no, it's not copied -- it's transfer of ownership semantics similar to std::auto_ptr ).  That way you can actually use the memory until psz goes out of scope and deletes it.
 +
 +
'''Printf uses wxChars'''
 +
 +
When using wxString::Printf, %s wants wxChar const*, so just use .c_str().  This is important to watch out for because Printf uses varargs, which aren't typesafe, so the copmiler doesn't catch the error.  If, for example, Mandrav uses .mb_str(), the compiler wont say anything because mb_str() is the same as c_str() in non-unicode mode, returning a char const*.  However, when me22 runs it in Unicode mode, mb_str() returns a proxy ( see above ), which can't be passed through a vararg and the program crashes at runtime.
  
  
 
This need to be rewritten.  If nobody else improves on this, I will try and rewrite this once I have used these macros more.  Joe M.
 
This need to be rewritten.  If nobody else improves on this, I will try and rewrite this once I have used these macros more.  Joe M.

Revision as of 23:21, 5 September 2005

This page is meant to be a location for developers to find all the current Unicode standards, or good practices, when editor and developing the Code::Blocks program. I am going to try and summarize the discussions that I was pointed to here, but I am leaving out the original author. Sorry Feel free to edit this to improve it, or keep it up to date. I am new to using wiki, so please exscuse the bad designs. This is a a VERY rough draft with no clear organizational pattern.
-- Joe M. Actually, it's quite good. I did a quick read-through as it looks fine to me. All I did was add some things to the bottom. ~ me22

reference: [1]


Macros

{NOTE: bullet list would look better here, but bold is used for now}
__TFILE__ = wxWidgets provide equivilant to __FILE__
__TDATE__ = wxWidgets provide equivilant to __DATE__
__TTIME__ = wxWidgets provide equivilant to __TIME__

_U() = Use it to convert non-literal char* strings to wxString. Use it for reading attributes from TiXmlNode's. If you deal with functions that return strings, you must use our _U macro.

Code:

  #ifdef wxUSE_UNICODE
    #define _U(x) wxString((x),wxConvUTF8)
    #define _UU(x,y) wxString((x),y)
  #else
    #define _U(x) (x)
    #define _UU(x,y) (x)
  #endif

i.e.: Code:

  const char* incompatible = "This is an incompatible string";
  wxString compatible = _U(incompatible);
  // wxString conftype = conf->Attribute("ConfigurationType"); // before
  wxString conftype = _U(conf->Attribute("ConfigurationType")); // after :)

_C() = multibyte C string see wxhelp (wxMBConv classes overview) Use this one for interacting with APIs needing char const*s, such as saving things to tinyXML.

Is defined in code as:

  #if wxUSE_UNICODE
     #define _UU(x,y) wxString((x),(y))
     #define _CC(x,y) (x).mb_str((y))
  #else
      #define _UU(x,y) (x)
      #define _CC(x,y) (x)
  #endif
  #define _U(x) _UU((x),wxConvUTF8)
  #define _C(x) _CC((x),wxConvUTF8)

_wxT() = fixed text's - like XRC resources object names (only adds an L before the string (ONLY if you're in a unicode build).)
wxT() is a macro which can be used with character and string literals (in other words, 'x' or "foo") to automatically convert them to Unicode in Unicode build configuration. Please see the Unicode overview for more information.

This macro is simply returns the value passed to it without changes in ASCII build. In fact, its definition is:

  #ifdef UNICODE
  #define wxT(x) L ## x
  #else // !Unicode
  #define wxT(x) x
  #endif

_T() = fixed text's - like XRC resources object names (only adds an L before the string (ONLY if you're in a unicode build).).
This macro is exactly the same as wxT and is defined in wxWidgets simply because it may be more intuitive for Windows programmers as the standard Win32 headers also define it (as well as yet another name for the same macro which is _TEXT()).

Don't confuse this macro with _()!

  wxChar _T(char ch)
  const wxChar * _T(const wxChar ch)

_() = text's which might be translated to other user-languages
This macro expands into a call to wxGetTranslation function, so it marks the message for the extraction by xgettext just as wxTRANSLATE does, but also returns the translation of the string for the current locale during execution.

Don't confuse this macro with _T()!

wxPLURAL = This macro is identical to _() but for the plural variant of wxGetTranslation.
const wxChar * wxPLURAL(const char *sing, const char *plur, size_tn)


Guidlines

char & wxChar:
Do not use wxChar when is not a text character, because a wxChar in unicode is an int of 16 bits (not 8 bits):

Example for text:

  wxChar im_a_character = _T('f');

Example for not text (not character):

  char im_a_byte = 254;

but perhaps better would be to use:

  byte im_a_byte = 254;

so it's clear that it's a byte and not a character.


Other:
Problem code:

  // indent code accordingly
  wxString code = it->second;
  code.Replace("\n", '\n' + lineIndent);

Solution: If the input is a const char*, use "normal strings". If the input is a wxChar or wxString, use the _T("macros"). For example:

  // indent code accordingly
  wxString code = it->second;
  code.Replace(_T("\n"), _T('\n') + lineIndent);


Some of the strings already converted in C::B, use _( when they should be _T(.

Example:

  WRONG: wxXmlResource::Get()->LoadDialog(this, parent, _("dlgGenericMultiSelect"));

dlgGenericMultiSelect is a reference to a resource. Therefore it must use _T instead.

  RIGHT: wxXmlResource::Get()->LoadDialog(this, parent, _T("dlgGenericMultiSelect"));

And don't forget to test for single characters, too!



All operations with wxStrings (not char*'s) should have _("string") for strings to be displayed to the user, and _T("string") for strings used internally.


Printf-like functions is - use c_str() (in examples in wxwidgets.org there are used different arguments for unicode and non-unicode versions where formating string was both "%s"). For example:

  tmpkey.Printf(_T("%s/editor/keywords/%d"), key.c_str(), i);


XRCID and XRCCTRL macros:
XRCID and XRCCTRL macros must _NOT_ be converted! They're pre-converted already!

  WRONG:   XRCCTRL(*this, _T("lblLabel"), wxStaticText)->SetLabel(label);


  RIGHT:   XRCCTRL(*this, "lblLabel", wxStaticText)->SetLabel(label);


concatenated strings:
_() is macro which calls one of wxWidget's internal function so concatenating should look like this:

  _("string 1" "string2" ... )

_T() macro simply adds 'L' before string given as a param (in Unicode of course, in normal mode it do nothing with the string) so concatenation should be:

  _T("string1") _T("string2") ...


Gotchas

_C() can return a proxy, not nessesarily a char const*

Don't write code like Code:

char const * psz = _C( str ); // formerly str.mb_str(wxConvUTF8);

_C(), in unicode mode, returns a buffer, not a raw pointer. This is a good thing because the buffer's destructor takes care of freeing the memory used by the string. This buffer is implicitly convertible to a char const* so that it can be used in things like strlen( str.mb_str() ) immediatly, but that opens up the error I'm warning about in this post.

What actually happens in the above code? wxString::mb_str() returns a buffer object. Said buffer's implicit conversion to char const* is activated, and the result is stored in psz. The temporary buffer then GOES OUT OF SCOPE and ITS DESTRUCTOR DELETES THE MEMORY. It seems that windows doesn't care, but that linux often has already reused the memory by the time psz is used again in the code.

Solution: Code:

wxWX2MBbuf psz = str.mb_str(wxConvUTF8);

wxWX2MBbuf takes ownership of the buffer ( no, it's not copied -- it's transfer of ownership semantics similar to std::auto_ptr ). That way you can actually use the memory until psz goes out of scope and deletes it.

Printf uses wxChars

When using wxString::Printf, %s wants wxChar const*, so just use .c_str(). This is important to watch out for because Printf uses varargs, which aren't typesafe, so the copmiler doesn't catch the error. If, for example, Mandrav uses .mb_str(), the compiler wont say anything because mb_str() is the same as c_str() in non-unicode mode, returning a char const*. However, when me22 runs it in Unicode mode, mb_str() returns a proxy ( see above ), which can't be passed through a vararg and the program crashes at runtime.


This need to be rewritten. If nobody else improves on this, I will try and rewrite this once I have used these macros more. Joe M.