Architecture and Design#
Modular Documentation System#
Documentation is structured using parent files that include shared components. Each manual is a parent file that pulls in reusable sections, keeping content DRY while allowing manual-specific overrides. A fix to a shared section propagates to every manual automatically.
Config-Driven Content#
A single set of JSON configuration files serves as the source of truth across UI tests, documentation, and integration tests.
Benefits of this approach:
Single source of truth — one config change updates tests and docs simultaneously
Consistency — UI tests, documentation, and integration tests all reference the same definitions
Scalability — adding a new element means updating one config file
Auditability — config diffs show exactly what changed and when
Automation-friendly — structured data is easy to parse programmatically, reducing errors in automated workflows
Reduced drift — documentation can never fall out of sync with what the tests actually verify
Safeguards#
A few custom configurations were needed to keep the LLM agent on track:
File lock system. A file-lock mechanism prevents the agent from modifying any files in the lock list. The agent can add files to the lock list but cannot remove them. A hook makes it harder for the agent to bypass the protection.
Commit guidelines. The agent follows established guidelines for committing, ensuring consistent commit messages and preventing accidental commits of work-in-progress files.
Semantic Roles#
The custom Sphinx theme includes 47+ semantic roles — inline markers for UI elements, navigation targets, and status indicators. Every semantic role renders with consistent styling, generates accessible markup, and becomes searchable and validatable. A script can verify that every reference points to a real element in the config. Plain text can’t do that.
The roles fall into three broad categories: UI elements (tables, columns, tabs, actions), navigation (pages, sections, categories), and status indicators (color-coded state markers).
Custom Directives#
Every element definition lives in a JSON config file. The documentation references these configs through custom directives that render structured output (styled HTML tables, lists, and semantic markup) directly from the config data.
When something changes in the application, one JSON entry is updated. Every page that references it updates automatically on the next build — no grep-and-replace across dozens of files.
PDF Layout Algorithm#
HTML documentation is forgiving — content flows, pages scroll, nothing breaks. PDF is different. A heading at the bottom of a page with its content on the next page looks unprofessional. An image split across pages is worse.
A custom scoring algorithm evaluates each section based on content weight (text length, images, admonitions) and inserts page breaks when a section’s score exceeds a budget threshold. The algorithm includes a soft margin to avoid break-thrashing — where adding a break pushes content to a new page, which changes the score, which removes the break. Combined with widow/orphan penalties and automatic spacing directives before headings, the result is PDF output that looks hand-tuned without anyone hand-tuning it.
Image Optimization#
A documentation site with thousands of screenshots can balloon in size. The optimization pipeline converts, resizes, caches (via content hashing), and tracks images across builds. The result: consistent image quality at a fraction of the storage cost, with builds that skip unchanged assets.
Deployment#
All manuals are built in parallel using matrix-based CI workflows and deployed to managed static hosting. Each manual is defined as a matrix entry, so onboarding a new manual usually requires configuration updates rather than pipeline redesign.
Platform Components#
Layer |
Implementation |
|---|---|
Documentation engine |
Sphinx + MyST (Markdown) |
Theme |
Custom theme layer with shared components |
Browser automation |
End-to-end test automation framework |
PDF generation |
Structured print build pipeline |
CI/CD |
Matrix-based workflow automation |
Hosting |
Managed static hosting platform |
AI assistance |
LLM-assisted pipeline orchestration and drafting |
Image processing |
Automated optimization scripts |
Key Takeaways#
- First manual is the hardest
Requires establishing all patterns, templates, configurations, and the pipeline infrastructure. Subsequent manuals took a fraction of the time.
- Safeguards are necessary
File locks and hooks prevent unwanted modifications. Without them, the agent occasionally modifies files it shouldn’t or commits work-in-progress changes.
- Anti-patterns compound
Each correction improves future runs. The agent maintains an anti-pattern list that grows over time, so the same mistake is rarely repeated.