The nanobind bindings
How _lexilla is built and what it exposes.
How it fits together
src/lexilla_vendor/-- vendored Lexilla release source (see docs/auditing.md for how it's verified against upstream).src/scintilla_interface/-- the handful of Scintilla headers (ILexer.h,Sci_Position.h,Scintilla.h) thatILexer5's declaration and the lexer implementations need but that Lexilla's own tarball doesn't ship -- see docs/auditing.md and docs/specs/mission.md for why.src/lexilla_core/CMakeLists.txt-- builds the vendored Lexilla sources (Lexilla.cxx,lexlib/*.cxx,lexers/Lex*.cxx) as a static library, analogous to pyside6-scintilla'ssrc/scintilla_qt/.src/lexilla/bindings/_binding.cpp-- nanobind module source. Exposescreate_lexer,get_lexer_count,get_lexer_name, and aLexerclass wrappingILexer5(name/identifier, property get/set/introspection,word_list_set, and the raw pointer forSCI_SETILEXER).src/lexilla/bindings/CMakeLists.txt-- compiles_binding.cppinto_lexilla.{pyd,so,dylib}viananobind_add_module, linked againstlexilla_core,add_subdirectory'd from the top-levelCMakeLists.txt.src/lexilla/__init__.py-- re-exports the compiled extension's public API as thelexillapackage.
Lexer lifetime and ownership
create_lexer(name) returns a Lexer wrapping a fresh ILexer5*, or None
if no lexer has that name. The wrapper owns the pointer (calling Release()
when garbage collected, or immediately via .release()) until
.detach() is called, which hands ownership to the caller -- e.g. a
Scintilla editor that the pointer is about to be handed to via
SCI_SETILEXER, which will manage and eventually release it itself. Calling
any other method after .detach()/.release() raises RuntimeError.
Deferred: Lex/Fold
ILexer5::Lex/Fold each take an IDocument*. In normal use Scintilla
calls them itself once a lexer is wired up via SCI_SETILEXER -- the
binding never needs to call them. Binding them as Python-callable would
mean also binding IDocument as a trampoline class Python code can
implement, a much bigger surface. See
borco/lexilla-py#6 for the
follow-up to investigate whether something like Pygments or tree-sitter
could usefully back an IDocument, or whether exposing it is worth doing at
all.
Type stubs (_lexilla.pyi)
Generated with make stubs (python -m nanobind.stubgen), checked into
git for IDE/type-checker support -- same rationale as pyside6-scintilla's
_pyside6_scintilla.pyi.
Excluded from ruff formatting (extend-exclude in pyproject.toml) since
it's machine-generated, not hand-edited.
Regenerate it after rebuilding the extension (make install or
uv sync --reinstall-package lexilla), and whenever _binding.cpp changes
the public API surface.
Typed enums instead of bare ints/strings (PropertyType, LanguageIdentifier, Language)
See docs/specs/mission.md's "No bare ints/strings for
'magic' values" decision for the full rationale. Three enums replace values
that were previously plain int/str:
PropertyType--Lexer.property_type()'s return value (Scintilla'sSC_TYPE_*). Small (3 values), hand-written directly in_binding.cpp.LanguageIdentifier--Lexer.identifier's return value (Scintilla'sSCLEX_*, ~142 values, declared in Lexilla's own vendoredinclude/SciLexer.h). Generated, not hand-typed, bytools/generate_language_enums.py(a top-leveltools/, matching the sibling pyside6-scintilla project's convention, and mirroring Lexilla's own//++Autogenerated -- run scripts/LexillaGen.py to regenerateconvention), which reuses the vendoredsrc/lexilla_vendor/scripts/LexillaData.pyand splices the result -- both theenum class LanguageIdentifierdefinition and itsnb::enum_<...>registration, as two separate marker-delimited blocks since the enum must be visible toLexer::Identifier()while its registration needsmand can only run insideNB_MODULE-- into_binding.cppbetween// ++Autogenerated/// --Autogeneratedmarkers. Run manually viamake generate-language-enums; not part of the normal build, so re-vendoring Lexilla is the trigger to re-run it (see docs/auditing.md's re-vendoring checklist).Language-- the ~139 lexer-name stringscreate_lexer()accepts. A Pythonenum.StrEnumin a generated, checked-insrc/lexilla/_languages.py(also produced bygenerate_language_enums.py), sinceStrEnummembers are realstrinstances and need no C++-side change to work withcreate_lexer().
Every value in all three enums gets its own short docstring (for
LanguageIdentifier/Language, auto-derived from the same lexer-name data
already being parsed), so IDE hover works uniformly regardless of how many
values an enum has.