Skip to content

The nanobind bindings

How _lexilla is built and what it exposes.

How it fits together

  • src/lexilla_vendor/ -- vendored Lexilla release source (see docs/auditing.md for how it's verified against upstream).
  • src/scintilla_interface/ -- the handful of Scintilla headers (ILexer.h, Sci_Position.h, Scintilla.h) that ILexer5's declaration and the lexer implementations need but that Lexilla's own tarball doesn't ship -- see docs/auditing.md and docs/specs/mission.md for why.
  • src/lexilla_core/CMakeLists.txt -- builds the vendored Lexilla sources (Lexilla.cxx, lexlib/*.cxx, lexers/Lex*.cxx) as a static library, analogous to pyside6-scintilla's src/scintilla_qt/.
  • src/lexilla/bindings/_binding.cpp -- nanobind module source. Exposes create_lexer, get_lexer_count, get_lexer_name, and a Lexer class wrapping ILexer5 (name/identifier, property get/set/introspection, word_list_set, and the raw pointer for SCI_SETILEXER).
  • src/lexilla/bindings/CMakeLists.txt -- compiles _binding.cpp into _lexilla.{pyd,so,dylib} via nanobind_add_module, linked against lexilla_core, add_subdirectory'd from the top-level CMakeLists.txt.
  • src/lexilla/__init__.py -- re-exports the compiled extension's public API as the lexilla package.

Lexer lifetime and ownership

create_lexer(name) returns a Lexer wrapping a fresh ILexer5*, or None if no lexer has that name. The wrapper owns the pointer (calling Release() when garbage collected, or immediately via .release()) until .detach() is called, which hands ownership to the caller -- e.g. a Scintilla editor that the pointer is about to be handed to via SCI_SETILEXER, which will manage and eventually release it itself. Calling any other method after .detach()/.release() raises RuntimeError.

Deferred: Lex/Fold

ILexer5::Lex/Fold each take an IDocument*. In normal use Scintilla calls them itself once a lexer is wired up via SCI_SETILEXER -- the binding never needs to call them. Binding them as Python-callable would mean also binding IDocument as a trampoline class Python code can implement, a much bigger surface. See borco/lexilla-py#6 for the follow-up to investigate whether something like Pygments or tree-sitter could usefully back an IDocument, or whether exposing it is worth doing at all.

Type stubs (_lexilla.pyi)

Generated with make stubs (python -m nanobind.stubgen), checked into git for IDE/type-checker support -- same rationale as pyside6-scintilla's _pyside6_scintilla.pyi. Excluded from ruff formatting (extend-exclude in pyproject.toml) since it's machine-generated, not hand-edited.

Regenerate it after rebuilding the extension (make install or uv sync --reinstall-package lexilla), and whenever _binding.cpp changes the public API surface.

Typed enums instead of bare ints/strings (PropertyType, LanguageIdentifier, Language)

See docs/specs/mission.md's "No bare ints/strings for 'magic' values" decision for the full rationale. Three enums replace values that were previously plain int/str:

  • PropertyType -- Lexer.property_type()'s return value (Scintilla's SC_TYPE_*). Small (3 values), hand-written directly in _binding.cpp.
  • LanguageIdentifier -- Lexer.identifier's return value (Scintilla's SCLEX_*, ~142 values, declared in Lexilla's own vendored include/SciLexer.h). Generated, not hand-typed, by tools/generate_language_enums.py (a top-level tools/, matching the sibling pyside6-scintilla project's convention, and mirroring Lexilla's own //++Autogenerated -- run scripts/LexillaGen.py to regenerate convention), which reuses the vendored src/lexilla_vendor/scripts/LexillaData.py and splices the result -- both the enum class LanguageIdentifier definition and its nb::enum_<...> registration, as two separate marker-delimited blocks since the enum must be visible to Lexer::Identifier() while its registration needs m and can only run inside NB_MODULE -- into _binding.cpp between // ++Autogenerated/// --Autogenerated markers. Run manually via make generate-language-enums; not part of the normal build, so re-vendoring Lexilla is the trigger to re-run it (see docs/auditing.md's re-vendoring checklist).
  • Language -- the ~139 lexer-name strings create_lexer() accepts. A Python enum.StrEnum in a generated, checked-in src/lexilla/_languages.py (also produced by generate_language_enums.py), since StrEnum members are real str instances and need no C++-side change to work with create_lexer().

Every value in all three enums gets its own short docstring (for LanguageIdentifier/Language, auto-derived from the same lexer-name data already being parsed), so IDE hover works uniformly regardless of how many values an enum has.