Creating a custom lexer for Code::Blocks editor

From Code::Blocks

The files that add syntax highlighting support for specific files are found under sdk/resources/lexers. They're simple XML files named as lexer_*.xml.

Let's take lexer_cpp.xml as an example and disect it.

Lexers

<Lexer name="C/C++"
       index="3"
       filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">

Pretty much self explanatory, except for the "magic" index number (we'll come to it in a sec).

  • name is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
  • filemasks is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
  • index corresponds with the wxSCI_LEX_* constants, found in sdk/wxscintilla/include/wx/wxscintilla.h. In this example, if you look in sdk/wxscintilla/include/wx/wxscintilla.h, you'll see that index 3 matches wxSCI_LEX_CPP. That is the lexer id for C/C++ syntax highlighting.

If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant wxSCI_LEX_XML which is defined to be number 5. So index=5. Simple.

Styles

Next follows many <Style> tags defining the different styles:

       <Style name="Default"
              index="0"
              fg="0,0,0"
              bg="255,255,255"
              bold="0"
              italics="0"
              underlined="0" />
  • name is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
  • fg is the foreground color. Comma separated list of three numbers from 0 to 255. In order: red, green and blue (RGB).
  • bg is the background color.
  • bold is "0" for disabled, "1" for enabled.
  • italics is "0" for disabled, "1" for enabled.
  • underlined is "0" for disabled, "1" for enabled.

You don't have to define all of these attributes. It's good to define them all for the "default" style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.

  • The index number in the <Style> tags, comes from a different set of constants defined in sdk/wxscintilla/include/wx/wxscintilla.h.

For each language supported by scintilla, there is a set of styles (lexical states) defined (these are what we're trying to configure with these files).

For example, for C/C++ files (wxSCI_LEX_CPP, remember?) the styles are defined as wxSCI_C_*.

For the "default" style shown above, this would be wxSCI_C_DEFAULT which is defined to be 0. Hence index=0 for "default".

       <Style name="Comment (normal)"
              index="1,2"
              fg="160,160,160" />

This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: 1 and 2 (always comma separated).

1 is for wxSCI_C_COMMENT (the C comment /* */) and 2 is for wxSCI_C_COMMENTLINE (the C++ comment to end of line // ).


There are some special styles defined by Code::Blocks and are available to all lexers:

  • index -99 is the selected text style.
  • index -98 is the active line style (the line the caret is on).
  • index -2 is the breakpoint line style.
  • index -3 is the debugger active line style (while stepping the debugger).
  • index -4 is the compiler warning/error line style. (Note: this index was removed completely?)

Keywords

Now on to the keywords.

       <Keywords>
              <Language index="0"
                        value="if int long try while and-so-on" />
              <User index="1" />
              <Documentation index="2"
                             value="param remarks return $ @ \ & < > # { } and-so-on" />
       </Keywords>

If the language you're defining a lexer configuration for, has keywords they should be added in the <Keywords> tag. This tag can contain the following tags:

<Language>, <User> and <Documentation>.

  • Language contains the language keywords. These are usually at index 0.
  • User is not used right now but might be in the future.
  • Documentation contains the documentation keywords (if any). If you look at the lexer_cpp.xml file, you'll see that the documentation keywords defined are those of Doxygen.


Sample Code

The tag left is SampleCode. This is much pretty self explanatory:

       <SampleCode value="lexer_cpp.sample"
                   breakpoint_line="20"
                   debug_line="22"
                   error_line="23"/>
  • value is the filename of the code that will be shown in the Preview window.

When creating a lexer_*.sample try to do it with simple and concise sample code (like the ones found in a typical "Hello world!"), yet include all the keywords of the lexer.

There are other optional options (Note: this index was removed completely?)

  • breakpoint_line is the number of the line in which a breakpoint line will be previewed.
  • debug_line is the number of the line in which a debug line will be previewed.
  • error_line is the number of the line in which an error line will be previewed.


Adding support for code-folding

Note: support for code-folding can't be done from the lexer files. It must be done right in the C::B code.

See the XML or C/C++ code for code-folding for an example of how to do it.

This might help also: http://sphere.sourceforge.net/flik/docs/scintilla-folding.html


Adding support for a lexer not supported in Scintilla

This is out of the scope of Code::Blocks.

Here are some instructions: http://scintilla.sourceforge.net/Lexer.txt

After you've written the Scintilla lexer, submit the files to the Scintilla tracker, following their own coding style.

After that, make any necesary change to wxScintilla and sumbit the files to the wxScintilla tracker or send a mail to the autor wyo@users.sourceforge.net (Otto Wyss).

And finishing, be sure to send all to the Code::Blocks tracker, and announcing at the [/index.php?board=7.0 forum] and the Wiki announcement for plugins/patches.