Creating a custom lexer for Code::Blocks editor

From Code::Blocks
Revision as of 06:32, 22 January 2006 by Takeshi Miya (talk | contribs)

The files that add syntax highlighting support for specific files are found under sdk/resources/lexers. They're simple XML files named as lexer_*.xml.

Let's take lexer_cpp.xml as an example and disect it.

Lexers

<Lexer name="C/C++"
       index="3"
       filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">

Pretty much self explanatory, except for the "magic" index number (we'll come to it in a sec).

  • name is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
  • filemasks is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
  • index corresponds with the wxSCI_LEX_* constants, found in sdk/wxscintilla/include/wx/wxscintilla.h. In this example, if you look in sdk/wxscintilla/include/wx/wxscintilla.h, you'll see that index 3 matches wxSCI_LEX_CPP. That is the lexer id for C/C++ syntax highlighting.

If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant wxSCI_LEX_XML which is defined to be number 5. So index=5. Simple.

Styles

Next follows many <Style> tags defining the different styles:

       <Style name="Default"
              index="0"
              fg="0,0,0"
              bg="255,255,255"
              bold="0"
              italics="0"
              underlined="0" />
  • name is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
  • fg is the foreground color. Comma separated list of three numbers from 0 to 255. In order: red, green and blue (RGB).
  • bg is the background color.
  • bold is "0" for disabled, "1" for enabled.
  • italics is "0" for disabled, "1" for enabled.
  • underlined is "0" for disabled, "1" for enabled.

You don't have to define all of these attributes. It's good to define them all for the "default" style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.

  • The index number in the <Style> tags, comes from a different set of constants defined in sdk/wxscintilla/include/wx/wxscintilla.h.

For each language supported by scintilla, there is a set of styles (lexical states) defined (these are what we're trying to configure with these files).

For example, for C/C++ files (wxSCI_LEX_CPP, remember?) the styles are defined as wxSCI_C_*.

For the "default" style shown above, this would be wxSCI_C_DEFAULT which is defined to be 0. Hence index=0 for "default".

       <Style name="Comment (normal)"
              index="1,2"
              fg="160,160,160" />

This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: 1 and 2 (always comma separated).

1 is for wxSCI_C_COMMENT (the C comment /* */) and 2 is for wxSCI_C_COMMENTLINE (the C++ comment to end of line // ).


There are some special styles defined by Code::Blocks and are available to all lexers:

  • index -99 is the selected text style.
  • index -98 is the active line style (the line the caret is on).
  • index -2 is the breakpoint line style.
  • index -3 is the debugger active line style (while stepping the debugger).
  • index -4 is the compiler warning/error line style. (Note: this index was removed completely?)

Keywords

Now on to the keywords.

       <Keywords>
              <Language index="0"
                        value="if int long try while and-so-on" />
              <User index="1" />
              <Documentation index="2"
                             value="param remarks return $ @ \ & < > # { } and-so-on" />
       </Keywords>

If the language you're defining a lexer configuration for, has keywords they should be added in the <Keywords> tag. This tag can contain the following tags:

<Language>, <User> and <Documentation>.

  • Language contains the language keywords. These are usually at index 0.
  • User is not used right now but might be in the future.
  • Documentation contains the documentation keywords (if any). If you look at the lexer_cpp.xml file, you'll see that the documentation keywords defined are those of Doxygen.