「利用者:Quorn/SyntaxHighlight」の版間の差分

提供: wiki
移動先: 案内検索
 
(1版 をインポートしました)
 
(相違点なし)

2018年6月29日 (金) 03:36時点における最新版

Syntax Highlighting

Committed to gsoc-2008-quorn branch: [rev. 15837]

Design

Calculating syntax highlighting for the whole document for every typed key is crazy. Since Python is line based, only the current line needs reparsing, with added consideration for multi-line statements.

Lines only affect the other lines in the following cases:

  • Triple quoted strings:
       """Some text in here...
          ...
       """
  • Explicit line joining in other strings:
       "This is a string \
       over multiple \
       lines. " + 'So \
       is this.'

The method render_string(st, str) populates temp_char_buf with the string str (rendered with spaces in place of tabs). The format string is currently the same length as this for each line.

I propose to keep the linep->format one char longer than temp_char_buf so that we may keep a flag at the end:

1 0 0 " S o m e   t e x t   i n   t h e   l i n e "\0
n n n l l l l l l l l l l l l l l l l l l l l l l l\0\x

Here x is a line continuation flag indicating the starting state of the line to follow. So lines forming the first part of a multi-line string will end with a non-zero flag. The value of the flag will indicate the type of string open:

0x00    No strings open
0x01    Single-quoted string open ('...')
0x02    Double-quoted string open ("...")
0x05    Triple single-quoted string open (...)
0x06    Triple double-quoted string open ("""...""")

Example:

Line:   1 0 0 " " " M u l t i - l i n e\0
Format: n n n l l l l l l l l l l l l l\0\6
Line:   s t a t e m e n t s\0
Format: l l l l l l l l l l\0\6
Line:   g o   l i k e   t h i s . " " "\0
Format: l l l l l l l l l l l l l l l l\0\0

With this system, a line being edited need only look at the end of the previous line to work out how to format itself. If its format ends with a continuation (different to how it started) the next line should be reformatted, and so on until a non-continuation line is reached (or the continuation doesn't change).

This vastly reduces the need for parsing and will speed up editing of large files.

Results

The original get_format_string method was unnecessarily large and difficult to understand. Rewriting it I now have two, well commented methods that are easy to follow: txt_format_line(...) and txt_format_text(...), the latter of which achieves the same result as get_format_string(...) but in much quicker time.

The numbers speak for themselves...

Timing results for 100,000 iterations of the whole document (all lines reparsed):

Timing syntax highlighting...    (small script)
    New system took: 2.209049
    Old system took: 5.150172
    New system took: 2.207988
    Old system took: 5.115926
Timing syntax highlighting...    (large script)
    New system took: 34.184102
    Old system took: 81.666011
    New system took: 34.200491
    Old system took: 80.773203

Since the new system also minimizes the number of lines parsed, this is only the worst case! Memory is now also allocated in one place reducing the chances of error.

Appendix

New format chars:

'_'   Whitespace
'#'   Comment text
'!'   Punctuation and other symbols
'n'   Numerals
'l'   String letters
'v'   Special variables (class, def)
'b'   Built-in names (print, for, etc.)
'q'   Other text (identifiers, etc.)

Format chars in prior versions:

'l'   Letters (strings)
'b'   Built in functions (keywords)
'#'   Comments
'v'   Special variables ("class" and "def") except
      that these are covered by 'b' above so 'v' is
      never actually used.
'n'   Numbers
'q'   Other text, symbols, etc
' '   Whitespace