Code Completion Design

From Code::Blocks
Revision as of 07:23, 23 February 2009 by Ollydbg (talk | contribs) (→‎Low level parser: introduce tokenzier)

How to build

Get the source code

When you download the svn source code of code::blocks,(see here Installing_Code::Blocks_from_source_on_Windows#Code::Blocks_sources the source code of this plugin was already included. See a screenshot of these code opened in code::blocks under windows.

Code completion source tree opened in code::blocks

Build the code completion plug in

Code completion build target option in code::blocks

Note, you should use "update.bat" to copy the new generated dll to the destination and strip the debug information. see Installing_Code::Blocks_from_source_on_Windows

Low level parser

For someone haven't heard what does "Token" and "Tokenize" means, you should read the wikibooks article A brief explain of what does a parser do and Tokenize on wikipedia. Shortly, a parser treats your C++ or C code as a large array of characters, then this big string was divided to small atom strings, meanwhile "spaces" and "comments" were ignored.

for a simple c++ program like below

int main()
    std::cout << "hello world" << std::endl;
    return 0;

After Tokenized it should give these 15 tokens

1 = string "int"
2 = string "main"
3 = opening parenthesis
4 = closing parenthesis
5 = opening brace
6 = string "std"
7 = namespace operator
8 = string "cout"
9 = << operator
10 = string ""hello world""
11 = string "endl"
12 = semicolon
13 = string "return"
14 = number 0
15 = closing brace

Tokenizer class

There are several steps to running the Tokenizer class

  • A thread must be created to parse a source file.
  • Open the source file and convert the file buff to Unicode mode.(since we are all using Unicode build of code::blocks, and ANSI mode is outdated).
  • The class contains a Pointer to the current position of the character, so, you can Get or Peek the current character.
  • Nested Value was kept to indicate your are in the correct brace pair.

Return a correct token

Special token should be replaced for parsing correctly. For example, in the standard c++ header (mingw), there are a string named "_GLIBCXX_STD", this should be replaced to "std". See the dialog below.

Cc std replacement.png

The inline function in the Tokenizer class will check whether a token should be replaced before return.

   //This is a map, check the first string and return the second string
   inline const wxString& ThisOrReplacement(const wxString& str) const
       ConfigManagerContainer::StringToStringMap::const_iterator it = s_Replacements.find(str);
       if (it != s_Replacements.end())
           return it->second;
       return str;

Code completion build target option in code::blocks

Setting the replacement mapping. Note that two many replacement mapping will slow down the parsing performance.

High level parser


For boosting the speed of allocating Tokens, the "new" and "delete" operator were overloaded in it's base class say "class Token : public BlockAllocated<Token, 10000>". In BlockAllocated class, there is only a static member say "static BlockAllocator<T, pool_size, debug> allocator;" to keep all the pre-allocated memorys for all derived class.10000 means 10000 Tokens were allocated.

Operator new overloading for fast allocate in the heap


Each identifier will be recorded in the TokenTree for later usage.

UI issue

Debug Log output

If you want to debug your plug-in, you may need to Logout the debug information. Mostly, here is the code Manager::Get()->GetLogManager()->DebugLog(_("XXXXX "));

Also, you need start the codeblocks with the command line argument. For example in windows.

codeblocks.exe --debug-log

then a Code::blocks debug panel will be shown to display the log.

Debug Log output panel