Difference between revisions of "Creating a custom lexer for Code::Blocks editor"

From Code::Blocks
 
(48 intermediate revisions by 9 users not shown)
Line 1: Line 1:
The files that add syntax highlighting support for specific files are found under <tt>sdk/resources/lexers</tt>. They're simple XML files named as <tt>lexer_*.xml</tt>.
+
[[Category:Developer Documentation]]
 +
Code::Blocks can provide syntax highlighting for any of the languages supported by Scintilla http://www.scintilla.org (the Rich Editing API used by C::B to provide syntax highlighting, code folding etc). Each language that C::B provides highlighting for has a C::B specific xml file found under <tt>sdk/resources/lexers</tt> that specifies styles and keywords of the language (they're simple XML files with names of the form <tt>lexer_*.xml</tt>). Adding a new language that is supported by Scintilla is as simple as adding an xml file for that language. At a minimum, the xml file must list the available set of styles for the lexical elements of a language and provide for keywords (the file will also specify a file containing sample code for editing and previewing lexer styles in Code::Blocks). Actual styles and keywords for each language can be edited by users within Code::Blocks under Settings->Editors->Syntax Highlighting allowing them view the results in a preview window (for the preview to display you will need to provide this sample code in a separate file also located in <tt>sdk/resources/lexers</tt>). User customized styles and keywords are kept in the users <tt>default.conf</tt> file (i.e. the xml file is never changed within codeblocks) allowing users to easily revert to the xml file.
  
Let's take <tt>lexer_cpp.xml</tt> as an example and disect it.
+
To illustrate how you might construct a new lexer xml file let's look at the pre-installed CPP lexer, whose styles and keywords are stored in <tt>lexer_cpp.xml</tt>.
 +
 
 +
==XML==
 +
 
 +
<?xml version="1.0"?>
 +
 
 +
Basically this says "I am an XML file". '''Very Important'''
 +
 
 +
==DOCTYPE==
 +
 
 +
<!DOCTYPE CodeBlocks_lexer_properties>
 +
 
 +
This needs to be in every lexer file. Code::Blocks '''will not''' load the lexer if this is not present.
 +
 
 +
==Lexers==
  
 
  <Lexer name="C/C++"
 
  <Lexer name="C/C++"
Line 7: Line 22:
 
         filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">
 
         filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">
  
Pretty much self explanatory, except for the "magic" index number (we 'll come to it in a sec).
+
Pretty much self explanatory, except for the "magic" ''index'' number (we'll come to it in a sec).
  
 
* '''name''' is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
 
* '''name''' is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
Line 13: Line 28:
 
* '''filemasks''' is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
 
* '''filemasks''' is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
  
* '''index''' corresponds with the <tt>wxSCI_LEX_*</tt> constants, found in <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>. In this example, if you look in <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>, you'll see that ''index 3'' matches <tt>wxSCI_LEX_CPP</tt>. That is the ''lexer id'' for C/C++ syntax highlighting.
+
* '''index''' corresponds with the <tt>wxSCI_LEX_*</tt> constants, found in [http://svn.berlios.de/wsvn/codeblocks/trunk/src/sdk/wxscintilla/include/wx/wxscintilla.h <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>]. In this example, if you look in <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>, you'll see that ''index 3'' matches <tt>wxSCI_LEX_CPP</tt>. That is the ''lexer id'' for C/C++ syntax highlighting.
  
 
If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant <tt>wxSCI_LEX_XML</tt> which is defined to be number ''5''. So ''index=5''. Simple.
 
If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant <tt>wxSCI_LEX_XML</tt> which is defined to be number ''5''. So ''index=5''. Simple.
 +
 +
==Styles==
  
 
Next follows many <tt><Style></tt> tags defining the different styles:
 
Next follows many <tt><Style></tt> tags defining the different styles:
Line 25: Line 42:
 
               bold="0"
 
               bold="0"
 
               italics="0"
 
               italics="0"
               underlined="0"/>
+
               underlined="0" />
  
 
* '''name''' is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
 
* '''name''' is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
Line 32: Line 49:
  
 
* '''bg''' is the background color.
 
* '''bg''' is the background color.
bold is <tt>"0"</tt> for disabled, <tt>"1"</tt> for enabled. Same goes for italics and underlined.
+
 
 +
* '''bold''' is <tt>"0"</tt> for disabled, <tt>"1"</tt> for enabled.
 +
 
 +
* '''italics''' is <tt>"0"</tt> for disabled, <tt>"1"</tt> for enabled.
 +
 
 +
* '''underlined''' is <tt>"0"</tt> for disabled, <tt>"1"</tt> for enabled.
  
 
You don't have to define all of these attributes. It's good to define them all for the <tt>"default"</tt> style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.
 
You don't have to define all of these attributes. It's good to define them all for the <tt>"default"</tt> style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.
  
* The '''index''' number in the <tt><Style></tt> tags, comes from a different set of constants defined in <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>. For each language supported by scintilla, there is a set of styles ''(lexical states)'' defined (these are what we're trying to configure with these files).  
+
* The '''index''' number in the <tt><Style></tt> tags, comes from a different set of constants defined in <tt>sdk/wxscintilla/include/wx/wxscintilla.h</tt>.  
 +
 
 +
For each language supported by scintilla, there is a set of styles ''(lexical states)'' defined (these are what we're trying to configure with these files).  
  
 
For example, for C/C++ files (<tt>wxSCI_LEX_CPP</tt>, remember?) the styles are defined as <tt>wxSCI_C_*</tt>.
 
For example, for C/C++ files (<tt>wxSCI_LEX_CPP</tt>, remember?) the styles are defined as <tt>wxSCI_C_*</tt>.
Line 42: Line 66:
 
For the <tt>"default"</tt> style shown above, this would be <tt>wxSCI_C_DEFAULT</tt> which is defined to be ''0''. Hence ''index=0'' for <tt>"default"</tt>.
 
For the <tt>"default"</tt> style shown above, this would be <tt>wxSCI_C_DEFAULT</tt> which is defined to be ''0''. Hence ''index=0'' for <tt>"default"</tt>.
  
 
+
        <Style name="Comment (normal)"
<Style name="Comment (normal)"
+
              index="1,2"
        index="1,2"
+
              fg="160,160,160" />
        fg="160,160,160"/>
 
 
 
  
 
This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: ''1'' and ''2'' (always comma separated).
 
This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: ''1'' and ''2'' (always comma separated).
Line 52: Line 74:
 
''1'' is for <tt>wxSCI_C_COMMENT</tt> (the C comment <tt>/* */</tt>) and ''2'' is for <tt>wxSCI_C_COMMENTLINE</tt> (the C++ comment to end of line <tt>// </tt>).
 
''1'' is for <tt>wxSCI_C_COMMENT</tt> (the C comment <tt>/* */</tt>) and ''2'' is for <tt>wxSCI_C_COMMENTLINE</tt> (the C++ comment to end of line <tt>// </tt>).
  
I just want to add that there are some special styles defined by Code::Blocks and are available to all lexers:
 
  
* Index -99: the selected text style
+
There are some special styles defined by Code::Blocks and are available to all lexers:
* Index -98: the active line style (the line the caret is on)
 
* Index -2: the breakpoint line style
 
* Index -3: the debugger active line style (while stepping the debugger)
 
* Index -4: the compiler warning/error line style
 
  
 +
* '''index -99''' is the selected text style.
 +
* '''index -98''' is the active line style (the line the caret is on).
 +
* '''index -2''' is the breakpoint line style.
 +
* '''index -3''' is the debugger active line style (while stepping the debugger).
 +
* '''index -4''' is the compiler warning/error line style. ('''Note: this index was removed completely?''')
 +
 +
==Keywords==
  
 
Now on to the keywords.
 
Now on to the keywords.
If the language you're defining a lexer configuration for, has keywords they should be added in the <tt><Keywords></tt> tag. This tag can contain the following tags:
 
  
<tt><Language></tt>, <tt><User></tt> and <tt><Documentation></tt>.
+
        <Keywords>
 +
              <Set index="0"
 +
                        value="if int long try while and-so-on" />
 +
              <Set index="1" />
 +
              <Set index="2"
 +
                              value="param remarks return $ @ \ & < > # { } and-so-on" />
 +
        </Keywords>
 +
 
 +
If you are defining a lexer configuration for a language that has keywords they should be added in the <tt><Keywords></tt> tag.
 +
In the scintilla version used by Code::Blocks, you can setup up to 9 keyword sets (the "index" attribute). What each set does and which sets are used by a lexer, is defined by the lexer itself.
 +
 
 +
If you open LexCPP.cxx (from the scintilla sources), you will find the following:
 +
 
 +
static const char * const cppWordLists[] = {
 +
            "Primary keywords and identifiers",
 +
            "Secondary keywords and identifiers",
 +
            "Documentation comment keywords",
 +
            "Unused",
 +
            "Global classes and typedefs",
 +
            0,
 +
        };
 +
 
 +
This tells us which indices are valid and what each one represents:
 +
 
 +
Index 0, "Primary keywords and identifiers"
 +
 
 +
Index 1, "Secondary keywords and identifiers"
 +
 
 +
Index 2, "Documentation comment keywords"
 +
 
 +
(notice that index "3" is not used)
 +
 
 +
Index 4, "Global classes and typedefs"
 +
 
 +
==Sample Code==
 +
 
 +
The tag left is <tt>SampleCode</tt>. This is much pretty self explanatory:
 +
 
 +
        <SampleCode value="lexer_cpp.sample"
 +
                    breakpoint_line="20"
 +
                    debug_line="22"
 +
                    error_line="23"/>
 +
 
 +
* '''value''' is the filename of the code that will be shown in the Preview window.
 +
When creating a <tt>lexer_*.sample</tt> try to do it with simple and concise sample code (like the ones found in a typical "Hello world!"), yet include all the styles of the lexer.
 +
 
 +
There are other optional options ('''Note: this index was removed completely?''')
 +
* '''breakpoint_line''' is the number of the line in which a breakpoint line will be previewed.
 +
* '''debug_line''' is the number of the line in which a debug line will be previewed.
 +
* '''error_line''' is the number of the line in which an error line will be previewed.
 +
 
 +
==Tips==
 +
When writing a lexer, this can be useful:
 +
* .properties files of SciTE (the official Scintilla-based text editor) [http://scintilla.sourceforge.net/SciTE.html]
 +
* .lang files of GtkSourceView (used Gedit) [http://git.gnome.org/browse/gtksourceview/tree/data/language-specs]
 +
 
 +
==Adding support for code-folding==
 +
 
 +
'''Note: support for code-folding can't be done from the lexer files. It must be done right in the Scintilla code'''.
 +
 
 +
See here: http://sphere.sourceforge.net/flik/docs/scintilla-folding.html
 +
 
 +
==Adding support for a lexer not supported in Scintilla==
 +
 
 +
'''Note: support for a lexer not supported in Scintilla is out of scope of Code::Blocks. It must be done right in the Scintilla code.'''
 +
 
 +
Here are some instructions: http://scintilla.sourceforge.net/Lexer.txt. To be able to use your newly created lexer in C::B, adjust the validity check of "lexer" in EditorColourSet::AddHighlightLanguage().
 +
 
 +
* After you've written the Scintilla lexer, submit the files to the [http://sourceforge.net/tracker/?group_id=2439 Scintilla tracker], following their own [http://scintilla.sourceforge.net/SciCoding.html coding style].
  
* '''Language''' contains the language keywords. These are usually at ''index 0''.
+
* After that, make any necesary change to wxScintilla and sumbit the files to the [http://sourceforge.net/tracker/?group_id=51305&atid=462818 wxScintilla tracker] or send a mail to the autor wyo@users.sourceforge.net (Otto Wyss).
  
* '''User''' is not used right now but might be in the future.
+
* And finishing, be sure to send all to the [http://developer.berlios.de/patch/?func=addpatch&group_id=5358 Code::Blocks tracker] (Category: Lexer), and announce it at the [/index.php?board=7.0 Forum] and the [[Announcement for plugins/patches|Wiki announcement for plugins/patches]].
  
* '''Documentation''' contains the documentation keywords (if any). If you look at the <tt>lexer_cpp.xml</tt> file, you'll see that the documentation keywords defined are those of [http://en.wikipedia.org/wiki/Doxygen doxygen].
+
==See also==
 +
* [[Languages supported by Code::Blocks lexers]]

Latest revision as of 19:02, 19 February 2017

Code::Blocks can provide syntax highlighting for any of the languages supported by Scintilla http://www.scintilla.org (the Rich Editing API used by C::B to provide syntax highlighting, code folding etc). Each language that C::B provides highlighting for has a C::B specific xml file found under sdk/resources/lexers that specifies styles and keywords of the language (they're simple XML files with names of the form lexer_*.xml). Adding a new language that is supported by Scintilla is as simple as adding an xml file for that language. At a minimum, the xml file must list the available set of styles for the lexical elements of a language and provide for keywords (the file will also specify a file containing sample code for editing and previewing lexer styles in Code::Blocks). Actual styles and keywords for each language can be edited by users within Code::Blocks under Settings->Editors->Syntax Highlighting allowing them view the results in a preview window (for the preview to display you will need to provide this sample code in a separate file also located in sdk/resources/lexers). User customized styles and keywords are kept in the users default.conf file (i.e. the xml file is never changed within codeblocks) allowing users to easily revert to the xml file.

To illustrate how you might construct a new lexer xml file let's look at the pre-installed CPP lexer, whose styles and keywords are stored in lexer_cpp.xml.

XML

<?xml version="1.0"?>

Basically this says "I am an XML file". Very Important

DOCTYPE

<!DOCTYPE CodeBlocks_lexer_properties>

This needs to be in every lexer file. Code::Blocks will not load the lexer if this is not present.

Lexers

<Lexer name="C/C++"
       index="3"
       filemasks="*.c,*.cpp,*.cc,*.cxx,*.h,*.hpp,*.hh,*.hxx,*.inl">

Pretty much self explanatory, except for the "magic" index number (we'll come to it in a sec).

  • name is the lexer's configuration name. This will appear in the editor's configuration dialog, in the languages drop down box (in colors editing page).
  • filemasks is a comma separated list of the extensions that this lexer should be used for. This is case-insensitive.
  • index corresponds with the wxSCI_LEX_* constants, found in sdk/wxscintilla/include/wx/wxscintilla.h. In this example, if you look in sdk/wxscintilla/include/wx/wxscintilla.h, you'll see that index 3 matches wxSCI_LEX_CPP. That is the lexer id for C/C++ syntax highlighting.

If we were building a lexer configuration for let's say, XML (random choice) we would look up the constant wxSCI_LEX_XML which is defined to be number 5. So index=5. Simple.

Styles

Next follows many <Style> tags defining the different styles:

       <Style name="Default"
              index="0"
              fg="0,0,0"
              bg="255,255,255"
              bold="0"
              italics="0"
              underlined="0" />
  • name is the style's name. It appears in the editor's configuration dialog, in the colors editing page.
  • fg is the foreground color. Comma separated list of three numbers from 0 to 255. In order: red, green and blue (RGB).
  • bg is the background color.
  • bold is "0" for disabled, "1" for enabled.
  • italics is "0" for disabled, "1" for enabled.
  • underlined is "0" for disabled, "1" for enabled.

You don't have to define all of these attributes. It's good to define them all for the "default" style (all lexers have a default style), but only the attributes needed should be defined for the rest of the styles.

  • The index number in the <Style> tags, comes from a different set of constants defined in sdk/wxscintilla/include/wx/wxscintilla.h.

For each language supported by scintilla, there is a set of styles (lexical states) defined (these are what we're trying to configure with these files).

For example, for C/C++ files (wxSCI_LEX_CPP, remember?) the styles are defined as wxSCI_C_*.

For the "default" style shown above, this would be wxSCI_C_DEFAULT which is defined to be 0. Hence index=0 for "default".

       <Style name="Comment (normal)"
              index="1,2"
              fg="160,160,160" />

This is the style definition for normal comments. As you can see you can define a single style for more than one style index, in this case two: 1 and 2 (always comma separated).

1 is for wxSCI_C_COMMENT (the C comment /* */) and 2 is for wxSCI_C_COMMENTLINE (the C++ comment to end of line // ).


There are some special styles defined by Code::Blocks and are available to all lexers:

  • index -99 is the selected text style.
  • index -98 is the active line style (the line the caret is on).
  • index -2 is the breakpoint line style.
  • index -3 is the debugger active line style (while stepping the debugger).
  • index -4 is the compiler warning/error line style. (Note: this index was removed completely?)

Keywords

Now on to the keywords.

       <Keywords>
              <Set index="0"
                        value="if int long try while and-so-on" />
              <Set index="1" />
              <Set index="2"
                             value="param remarks return $ @ \ & < > # { } and-so-on" />
       </Keywords>

If you are defining a lexer configuration for a language that has keywords they should be added in the <Keywords> tag. In the scintilla version used by Code::Blocks, you can setup up to 9 keyword sets (the "index" attribute). What each set does and which sets are used by a lexer, is defined by the lexer itself.

If you open LexCPP.cxx (from the scintilla sources), you will find the following:

static const char * const cppWordLists[] = {
           "Primary keywords and identifiers",
           "Secondary keywords and identifiers",
           "Documentation comment keywords",
           "Unused",
           "Global classes and typedefs",
           0,
       };

This tells us which indices are valid and what each one represents:

Index 0, "Primary keywords and identifiers"

Index 1, "Secondary keywords and identifiers"

Index 2, "Documentation comment keywords"

(notice that index "3" is not used)

Index 4, "Global classes and typedefs"

Sample Code

The tag left is SampleCode. This is much pretty self explanatory:

       <SampleCode value="lexer_cpp.sample"
                   breakpoint_line="20"
                   debug_line="22"
                   error_line="23"/>
  • value is the filename of the code that will be shown in the Preview window.

When creating a lexer_*.sample try to do it with simple and concise sample code (like the ones found in a typical "Hello world!"), yet include all the styles of the lexer.

There are other optional options (Note: this index was removed completely?)

  • breakpoint_line is the number of the line in which a breakpoint line will be previewed.
  • debug_line is the number of the line in which a debug line will be previewed.
  • error_line is the number of the line in which an error line will be previewed.

Tips

When writing a lexer, this can be useful:

  • .properties files of SciTE (the official Scintilla-based text editor) [1]
  • .lang files of GtkSourceView (used Gedit) [2]

Adding support for code-folding

Note: support for code-folding can't be done from the lexer files. It must be done right in the Scintilla code.

See here: http://sphere.sourceforge.net/flik/docs/scintilla-folding.html

Adding support for a lexer not supported in Scintilla

Note: support for a lexer not supported in Scintilla is out of scope of Code::Blocks. It must be done right in the Scintilla code.

Here are some instructions: http://scintilla.sourceforge.net/Lexer.txt. To be able to use your newly created lexer in C::B, adjust the validity check of "lexer" in EditorColourSet::AddHighlightLanguage().

  • After that, make any necesary change to wxScintilla and sumbit the files to the wxScintilla tracker or send a mail to the autor wyo@users.sourceforge.net (Otto Wyss).

See also