Creating a custom lexer for Code::Blocks editor
The files that add syntax highlighting support for specific files are found under sdk/resources/lexers. They're simple XML files named as lexer_*.xml.
Let's take lexer_cpp.xml as an example and disect it.
Lexers
<Lexer name="C/C++" index="3" filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">
Pretty much self explanatory, except for the "magic" index number (we'll come to it in a sec).
- name is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
- filemasks is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
- index corresponds with the wxSCI_LEX_* constants, found in sdk/wxscintilla/include/wx/wxscintilla.h. In this example, if you look in sdk/wxscintilla/include/wx/wxscintilla.h, you'll see that index 3 matches wxSCI_LEX_CPP. That is the lexer id for C/C++ syntax highlighting.
If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant wxSCI_LEX_XML which is defined to be number 5. So index=5. Simple.
Styles
Next follows many <Style> tags defining the different styles:
<Style name="Default" index="0" fg="0,0,0" bg="255,255,255" bold="0" italics="0" underlined="0" />
- name is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
- fg is the foreground color. Comma separated list of three numbers from 0 to 255. In order: red, green and blue (RGB).
- bg is the background color.
- bold is "0" for disabled, "1" for enabled.
- italics is "0" for disabled, "1" for enabled.
- underlined is "0" for disabled, "1" for enabled.
You don't have to define all of these attributes. It's good to define them all for the "default" style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.
- The index number in the <Style> tags, comes from a different set of constants defined in sdk/wxscintilla/include/wx/wxscintilla.h.
For each language supported by scintilla, there is a set of styles (lexical states) defined (these are what we're trying to configure with these files).
For example, for C/C++ files (wxSCI_LEX_CPP, remember?) the styles are defined as wxSCI_C_*.
For the "default" style shown above, this would be wxSCI_C_DEFAULT which is defined to be 0. Hence index=0 for "default".
<Style name="Comment (normal)" index="1,2" fg="160,160,160" />
This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: 1 and 2 (always comma separated).
1 is for wxSCI_C_COMMENT (the C comment /* */) and 2 is for wxSCI_C_COMMENTLINE (the C++ comment to end of line // ).
There are some special styles defined by Code::Blocks and are available to all lexers:
- index -99 is the selected text style.
- index -98 is the active line style (the line the caret is on).
- index -2 is the breakpoint line style.
- index -3 is the debugger active line style (while stepping the debugger).
- index -4 is the compiler warning/error line style. (Note: this index was removed completely?)
Keywords
Now on to the keywords.
<Keywords> <Language index="0" value="if int long try while and-so-on" /> <User index="1" /> <Documentation index="2" value="param remarks return $ @ \ & < > # { } and-so-on" /> </Keywords>
If the language you're defining a lexer configuration for, has keywords they should be added in the <Keywords> tag. This tag can contain the following tags:
<Language>, <User> and <Documentation>.
- Language contains the language keywords. These are usually at index 0.
- User is not used right now but might be in the future.
- Documentation contains the documentation keywords (if any). If you look at the lexer_cpp.xml file, you'll see that the documentation keywords defined are those of Doxygen.
Sample Code
The tag left is SampleCode. This is much pretty self explanatory:
<SampleCode value="lexer_cpp.sample" breakpoint_line="20" debug_line="22" error_line="23"/>
- value is the filename of the code that will be shown in the Preview window.
When creating a lexer_*.sample try to do it with simple code (like the ones found in a typical "Hello world!"), yet include all the keywords of the lexer, so the user can preview how they'll be applied.
There are other optional options (Note: this index was removed completely?)
- breakpoint_line is the number of the line in which a breakpoint line will be previewed.
- debug_line is the number of the line in which a debug line will be previewed.
- error_line is the number of the line in which an error line will be previewed.