DSL API Reference¶

Module: doctk.dsl Version: 0.2.0 Status: Stable

Overview¶

The doctk DSL (Domain-Specific Language) provides a pipeline-based syntax for document manipulation. The DSL supports:

Pipeline Operations: Chain operations using | operator
REPL: Interactive command-line interface
Script Execution: Run .tk script files
Code Block Execution: Execute DSL code blocks in Markdown files
Variable Assignment: Store intermediate results

DSL Syntax¶

Basic Pipeline¶

doc | promote h2-0

Chained Operations¶

doc | promote h2-0 | demote h3-1 | nest h2-2 h1-0

Variable Assignment¶

let updated = doc | promote h2-0
updated | move_up h2-1

Operations¶

Operation	Syntax	Description
`promote`	`promote <node_id>`	Decrease heading level (h3 → h2)
`demote`	`demote <node_id>`	Increase heading level (h2 → h3)
`move_up`	`move_up <node_id>`	Move section up among siblings
`move_down`	`move_down <node_id>`	Move section down among siblings
`nest`	`nest <node_id> <under_id>`	Nest section under another
`unnest`	`unnest <node_id>`	Un-nest section to parent level

Module Structure¶

src/doctk/dsl/
├── __init__.py       # Public API exports
├── lexer.py          # Tokenization
├── parser.py         # AST generation
├── executor.py       # DSL execution
├── repl.py           # Interactive REPL
└── codeblock.py      # Markdown code block execution

Core Classes¶

Lexer¶

Location: doctk.dsl.lexer.Lexer

Tokenizes DSL source code into tokens.

Constructor¶

Lexer(source: str)

Parameters:

source: DSL source code to tokenize

Methods¶

`tokenize() -> list[Token]`¶

Tokenize the source code.

Returns: List of Token objects

Raises: LexerError if invalid syntax is encountered

Example:

from doctk.dsl.lexer import Lexer

lexer = Lexer("doc | promote h2-0")
tokens = lexer.tokenize()

for token in tokens:
    print(f"{token.type.name}: {token.value}")

Token Types¶

class TokenType(Enum):
    # Literals
    IDENTIFIER = auto()    # Variable or operation name
    STRING = auto()        # "text" or 'text'
    NUMBER = auto()        # 123, 3.14
    BOOLEAN = auto()       # true, false

    # Operators
    PIPE = auto()          # |
    EQUALS = auto()        # =
    NOT_EQUALS = auto()    # !=
    GREATER = auto()       # >
    LESS = auto()          # <
    GREATER_EQUAL = auto() # >=
    LESS_EQUAL = auto()    # <=
    TILDE_EQUALS = auto()  # ~= (regex match)
    CARET_EQUALS = auto()  # ^= (starts with)
    DOLLAR_EQUALS = auto() # $= (ends with)
    STAR_EQUALS = auto()   # *= (contains)

    # Keywords
    LET = auto()           # let
    DOC = auto()           # doc
    WHERE = auto()         # where
    SELECT = auto()        # select
    TRUE = auto()          # true
    FALSE = auto()         # false

    # Delimiters
    LPAREN = auto()        # (
    RPAREN = auto()        # )
    COMMA = auto()         # ,
    NEWLINE = auto()       # Line break
    EOF = auto()           # End of file

Parser¶

Location: doctk.dsl.parser.Parser

Parses tokens into an Abstract Syntax Tree (AST).

Constructor¶

Parser(tokens: list[Token])

Parameters:

tokens: List of tokens from Lexer

Methods¶

`parse() -> list[ASTNode]`¶

Parse tokens into AST.

Returns: List of AST nodes (statements)

Raises: ParseError if syntax is invalid

Example:

from doctk.dsl.lexer import Lexer
from doctk.dsl.parser import Parser

lexer = Lexer("doc | promote h2-0")
tokens = lexer.tokenize()

parser = Parser(tokens)
ast = parser.parse()

for node in ast:
    print(f"Statement: {type(node).__name__}")

AST Node Types¶

@dataclass
class Pipeline:
    """Pipeline expression: source | op1 | op2"""
    source: str
    operations: list[FunctionCall]

@dataclass
class FunctionCall:
    """Function call: name(arg1, arg2, key1=val1)"""
    name: str
    args: list[Any]  # Positional arguments
    kwargs: dict[str, Any]  # Keyword arguments

@dataclass
class Assignment:
    """Variable assignment: let var = pipeline"""
    variable: str
    pipeline: Pipeline

Executor¶

Location: doctk.dsl.executor.Executor

Executes AST against documents.

Constructor¶

Executor(document: Document[Any])

Parameters:

document: Initial document to operate on

Methods¶

`execute(ast: list[ASTNode]) -> Document[Any]`¶

Execute AST statements.

Parameters:

ast: List of AST nodes from Parser

Returns: Resulting document after executing all statements

Raises: ExecutionError if execution fails

Example:

from doctk import Document
from doctk.dsl.lexer import Lexer
from doctk.dsl.parser import Parser
from doctk.dsl.executor import Executor

# Load document
doc = Document.from_file("example.md")

# Parse DSL
lexer = Lexer("doc | promote h2-0 | demote h3-1")
tokens = lexer.tokenize()
parser = Parser(tokens)
ast = parser.parse()

# Execute
executor = Executor(doc)
result_doc = executor.execute(ast)

# Save result
result_doc.to_file("output.md")

REPL¶

Location: doctk.dsl.repl.DoctkREPL

Interactive Read-Eval-Print Loop for DSL.

Constructor¶

DoctkREPL()

Methods¶

`run() -> None`¶

Start the REPL (blocks until user exits).

Example:

from doctk.dsl.repl import DoctkREPL

repl = DoctkREPL()
repl.run()  # Interactive session

REPL Commands¶

Command	Description	Example
`load <file>`	Load a document	`load example.md`
`save <file>`	Save current document	`save output.md`
`tree`	Show document tree	`tree`
`list`	List all nodes	`list`
`help`	Show help	`help`
`exit`	Exit REPL	`exit`
operation	Execute operation	`promote h2-0`

Session Example:

doctk> load example.md
Loaded document with 45 nodes

doctk> tree
Document
├── h1-0: Introduction
│   ├── h2-0: Overview
│   └── h2-1: Goals

doctk> promote h2-0
Operation successful

doctk> save output.md
Saved to output.md

doctk> exit
Goodbye!

ScriptExecutor¶

Location: doctk.dsl.executor.ScriptExecutor

Execute DSL scripts from files.

Static Methods¶

`execute_file(script_path: str | Path, document_path: str | Path) -> Document[Any]`¶

Execute a script file on a document.

Note: This is an instance method - you must create a ScriptExecutor first.

Parameters:

script_path: Path to .tk script file
document_path: Path to document to transform

Returns: Resulting document

Raises: ExecutionError if execution fails

Example:

from pathlib import Path
from doctk import Document
from doctk.dsl.executor import ScriptExecutor

# Create executor (needs a document)
doc = Document.from_file("example.md")
executor = ScriptExecutor(doc)

# Execute script file on a document
result = executor.execute_file(
    script_path=Path("script.tk"),
    document_path=Path("example.md")
)

result.to_file("output.md")

Script File Example (script.tk):

# Promote all h2 headings
doc | promote h2-0 | promote h2-1 | promote h2-2

# Nest a section
doc | nest h2-3 h1-0

`execute_file_and_save(script_path, doc_path, output_path) -> None`¶

Execute script and save result (convenience method).

Parameters:

script_path: Path to script file
doc_path: Path to input document
output_path: Path to save result

Example:

ScriptExecutor.execute_file_and_save(
    script_path="transform.tk",
    doc_path="input.md",
    output_path="output.md"
)

CodeBlockExecutor¶

Location: doctk.dsl.codeblock.CodeBlockExecutor

Execute DSL code blocks in Markdown documents.

Constructor¶

CodeBlockExecutor(document: Document[Any])

Methods¶

`find_code_blocks(markdown_text: str) -> list[CodeBlock]`¶

Find all doctk code blocks in Markdown.

Parameters:

markdown_text: Markdown source text

Returns: List of CodeBlock objects

Example:

from doctk.dsl.codeblock import CodeBlockExecutor

markdown = '''
# Document

Some text.

```doctk
doc | promote h2-0

More text.

doc | nest h2-1 h1-0

'''

executor = CodeBlockExecutor(doc) blocks = executor.find_code_blocks(markdown) print(f"Found {len(blocks)} doctk code blocks")

##### `execute_block(code_block: CodeBlock) -> Document[Any]`

Execute a single code block.

**Parameters:**
- `code_block`: CodeBlock to execute

**Returns:** Resulting document

##### `execute_all_blocks(markdown_text: str) -> Document[Any]`

Execute all code blocks sequentially.

**Parameters:**
- `markdown_text`: Markdown source with doctk code blocks

**Returns:** Document after executing all blocks

**Example:**
```python
# Execute all blocks in a Markdown file
with open("document.md") as f:
    markdown = f.read()

doc = Document.from_file("input.md")
executor = CodeBlockExecutor(doc)
result = executor.execute_all_blocks(markdown)
result.to_file("output.md")

`execute_file(markdown_path, block_index=0) -> Document`¶

Execute a specific code block from a Markdown file.

Note: This is an instance method - you must create a CodeBlockExecutor with a document first.

Parameters:

markdown_path: Path to Markdown file with code blocks
block_index: Index of code block to execute (0-based, default: 0)

Returns: Resulting document

Example:

from pathlib import Path
from doctk import Document
from doctk.dsl.codeblock import CodeBlockExecutor

# Load the document to transform
doc = Document.from_file("document.md")

# Create executor
executor = CodeBlockExecutor(doc)

# Execute specific block (index 0)
result = executor.execute_file(
    markdown_path=Path("instructions.md"),
    block_index=0
)

# Execute all blocks sequentially (using execute_all_blocks)
markdown_text = Path("instructions.md").read_text()
result = executor.execute_all_blocks(markdown_text)

Data Types¶

Token¶

Location: doctk.dsl.lexer.Token

A token produced by the lexer.

Fields:

type: TokenType - Token type
value: str - Token value
line: int - Line number (1-indexed)
column: int - Column number (1-indexed)

CodeBlock¶

Location: doctk.dsl.codeblock.CodeBlock

A doctk code block found in Markdown.

Fields:

code: str - DSL code content
line_number: int - Starting line in Markdown

Error Handling¶

LexerError¶

Raised when tokenization fails due to invalid characters or syntax.

Attributes:

message: str - Error description
line: int | None - Line number
column: int | None - Column number

Example:

from doctk.dsl.lexer import Lexer, LexerError

try:
    lexer = Lexer("doc | @invalid")
    tokens = lexer.tokenize()
except LexerError as e:
    print(f"Lexer error at line {e.line}, column {e.column}: {e.message}")

ParseError¶

Raised when parsing fails due to invalid syntax.

Attributes:

message: str - Error description
token: Token - Token where error occurred

Example:

from doctk.dsl.parser import Parser, ParseError

try:
    parser = Parser(tokens)
    ast = parser.parse()
except ParseError as e:
    print(f"Parse error at line {e.token.line}: {e.message}")

ExecutionError¶

Raised when DSL execution fails.

Attributes:

message: str - Error description
line: int | None - Line number in script
column: int | None - Column number in script

Example:

from doctk.dsl.executor import Executor, ExecutionError

try:
    executor = Executor(doc)
    result = executor.execute(ast)
except ExecutionError as e:
    print(f"Execution error at line {e.line}: {e.message}")

CLI Integration¶

The DSL can be executed via the doctk CLI:

Execute Script File¶

uv run doctk execute script.tk document.md -o output.md

Execute Code Block¶

uv run doctk execute-block instructions.md document.md --block 0

Start REPL¶

uv run doctk repl

Usage Examples¶

End-to-End Pipeline¶

from doctk import Document
from doctk.dsl.lexer import Lexer
from doctk.dsl.parser import Parser
from doctk.dsl.executor import Executor

# 1. Load document
doc = Document.from_file("example.md")

# 2. Define DSL script
script = """
doc | promote h2-0 | promote h2-1
doc | nest h2-2 h1-0
"""

# 3. Lex + Parse + Execute
lexer = Lexer(script)
tokens = lexer.tokenize()

parser = Parser(tokens)
ast = parser.parse()

executor = Executor(doc)
result = executor.execute(ast)

# 4. Save result
result.to_file("output.md")

REPL Usage¶

from doctk.dsl.repl import DoctkREPL

# Start interactive session
repl = DoctkREPL()
repl.run()

Script File Execution¶

from pathlib import Path
from doctk import Document
from doctk.dsl.executor import ScriptExecutor

# Create executor
doc = Document.from_file("input.md")
executor = ScriptExecutor(doc)

# Execute script file
result = executor.execute_file(
    script_path=Path("transform.tk"),
    document_path=Path("input.md")
)
result.to_file("output.md")

Code Block Execution¶

from pathlib import Path
from doctk import Document
from doctk.dsl.codeblock import CodeBlockExecutor

# Load document
doc = Document.from_file("document.md")

# Create executor and execute code blocks
executor = CodeBlockExecutor(doc)

# Execute specific block
result = executor.execute_file(Path("instructions.md"), block_index=0)
result.to_file("transformed.md")

# Or execute all blocks sequentially
markdown_text = Path("instructions.md").read_text()
result = executor.execute_all_blocks(markdown_text)
result.to_file("transformed.md")

Performance Characteristics¶

Lexer¶

Time Complexity: O(n) where n = source length
Space Complexity: O(t) where t = number of tokens

Parser¶

Time Complexity: O(t) where t = number of tokens
Space Complexity: O(a) where a = AST size

Executor¶

Time Complexity: O(a × d) where a = AST size, d = document size
Space Complexity: O(d) for document copies (immutability)

Language Grammar¶

EBNF Grammar¶

program        = { statement } ;
statement      = assignment | pipeline ;
assignment     = "let" identifier "=" pipeline ;
pipeline       = source { "|" operation } ;
source         = "doc" | identifier ;
operation      = identifier { argument } ;
argument       = identifier | number | string ;
identifier     = letter { letter | digit | "_" | "-" } ;
number         = digit { digit } ;
string         = '"' { character } '"' ;

Precedence Rules¶

let bindings (lowest)
| pipeline operator
Function calls (highest)

Extension Points¶

Adding New Operations¶

To add a new DSL operation:

Add to StructureOperations in integration/operations.py
Add to Executor dispatch table in dsl/executor.py
Update this documentation

Example:

# In Executor.__init__
self._operation_dispatch = {
    # ... existing operations ...
    "my_operation": (self._exec_my_operation, 1),  # 1 argument
}

def _exec_my_operation(self, node_id: str) -> Document[Any]:
    """Execute my_operation."""
    result = self.operations.my_operation(self.document, node_id)
    if not result.success:
        raise ExecutionError(result.error or "Operation failed")
    return Document.from_string(result.document)

Testing¶

See tests/unit/test_dsl_*.py for comprehensive test examples:

Lexer tests: Token generation, error handling
Parser tests: AST generation, syntax errors
Executor tests: Operation execution, variable binding
REPL tests: Command handling, state management
Script execution tests: File execution, error reporting
Code block tests: Block detection, sequential execution

Run tests:

uv run pytest tests/unit/test_dsl*.py     # All DSL tests
uv run pytest tests/e2e/test_script*.py   # E2E script tests

DSL API Reference¶

Overview¶

DSL Syntax¶

Basic Pipeline¶

Chained Operations¶

Variable Assignment¶

Operations¶

Module Structure¶

Core Classes¶

Lexer¶

Constructor¶

Methods¶

tokenize() -> list[Token]¶

Token Types¶

Parser¶

Constructor¶

Methods¶

parse() -> list[ASTNode]¶

AST Node Types¶

Executor¶

Constructor¶

Methods¶

execute(ast: list[ASTNode]) -> Document[Any]¶

REPL¶

Constructor¶

Methods¶

run() -> None¶

REPL Commands¶

ScriptExecutor¶

Static Methods¶

execute_file(script_path: str | Path, document_path: str | Path) -> Document[Any]¶

execute_file_and_save(script_path, doc_path, output_path) -> None¶

CodeBlockExecutor¶

Constructor¶

Methods¶

find_code_blocks(markdown_text: str) -> list[CodeBlock]¶

execute_file(markdown_path, block_index=0) -> Document¶

Data Types¶

Token¶

CodeBlock¶

Error Handling¶

LexerError¶

ParseError¶

ExecutionError¶

CLI Integration¶

Execute Script File¶

Execute Code Block¶

Start REPL¶

Usage Examples¶

End-to-End Pipeline¶

REPL Usage¶

Script File Execution¶

Code Block Execution¶

Performance Characteristics¶

Lexer¶

Parser¶

Executor¶

Language Grammar¶

EBNF Grammar¶

Precedence Rules¶

Extension Points¶

Adding New Operations¶

Testing¶

See Also¶

`tokenize() -> list[Token]`¶

`parse() -> list[ASTNode]`¶

`execute(ast: list[ASTNode]) -> Document[Any]`¶

`run() -> None`¶

`execute_file(script_path: str | Path, document_path: str | Path) -> Document[Any]`¶

`execute_file_and_save(script_path, doc_path, output_path) -> None`¶

`find_code_blocks(markdown_text: str) -> list[CodeBlock]`¶

`execute_file(markdown_path, block_index=0) -> Document`¶