Architecture and Design#

Modular Documentation System#

Documentation is structured using parent files that include shared components. Each manual is a parent file that pulls in reusable sections, keeping content DRY while allowing manual-specific overrides. A fix to a shared section propagates to every manual automatically.

Config-Driven Content#

A single set of JSON configuration files serves as the source of truth across UI tests, documentation, and integration tests.

Benefits of this approach:

  1. Single source of truth — one config change updates tests and docs simultaneously

  2. Consistency — UI tests, documentation, and integration tests all reference the same definitions

  3. Scalability — adding a new element means updating one config file

  4. Auditability — config diffs show exactly what changed and when

  5. Automation-friendly — structured data is easy to parse programmatically, reducing errors in automated workflows

  6. Reduced drift — documentation can never fall out of sync with what the tests actually verify

Safeguards#

A few custom configurations were needed to keep the LLM agent on track:

File lock system. A file-lock mechanism prevents the agent from modifying any files in the lock list. The agent can add files to the lock list but cannot remove them. A hook makes it harder for the agent to bypass the protection.

Commit guidelines. The agent follows established guidelines for committing, ensuring consistent commit messages and preventing accidental commits of work-in-progress files.

Semantic Roles#

The custom Sphinx theme includes 47+ semantic roles — inline markers for UI elements, navigation targets, and status indicators. Every semantic role renders with consistent styling, generates accessible markup, and becomes searchable and validatable. A script can verify that every reference points to a real element in the config. Plain text can’t do that.

The roles fall into three broad categories: UI elements (tables, columns, tabs, actions), navigation (pages, sections, categories), and status indicators (color-coded state markers).

Custom Directives#

Every element definition lives in a JSON config file. The documentation references these configs through custom directives that render structured output (styled HTML tables, lists, and semantic markup) directly from the config data.

When something changes in the application, one JSON entry is updated. Every page that references it updates automatically on the next build — no grep-and-replace across dozens of files.

PDF Layout Algorithm#

HTML documentation is forgiving — content flows, pages scroll, nothing breaks. PDF is different. A heading at the bottom of a page with its content on the next page looks unprofessional. An image split across pages is worse.

A custom scoring algorithm evaluates each section based on content weight (text length, images, admonitions) and inserts page breaks when a section’s score exceeds a budget threshold. The algorithm includes a soft margin to avoid break-thrashing — where adding a break pushes content to a new page, which changes the score, which removes the break. Combined with widow/orphan penalties and automatic spacing directives before headings, the result is PDF output that looks hand-tuned without anyone hand-tuning it.

Image Optimization#

A documentation site with thousands of screenshots can balloon in size. The optimization pipeline converts, resizes, caches (via content hashing), and tracks images across builds. The result: consistent image quality at a fraction of the storage cost, with builds that skip unchanged assets.

Deployment#

All manuals are built in parallel using matrix-based CI workflows and deployed to managed static hosting. Each manual is defined as a matrix entry, so onboarding a new manual usually requires configuration updates rather than pipeline redesign.

Platform Components#

Layer

Implementation

Documentation engine

Sphinx + MyST (Markdown)

Theme

Custom theme layer with shared components

Browser automation

End-to-end test automation framework

PDF generation

Structured print build pipeline

CI/CD

Matrix-based workflow automation

Hosting

Managed static hosting platform

AI assistance

LLM-assisted pipeline orchestration and drafting

Image processing

Automated optimization scripts

Key Takeaways#

First manual is the hardest

Requires establishing all patterns, templates, configurations, and the pipeline infrastructure. Subsequent manuals took a fraction of the time.

Safeguards are necessary

File locks and hooks prevent unwanted modifications. Without them, the agent occasionally modifies files it shouldn’t or commits work-in-progress changes.

Anti-patterns compound

Each correction improves future runs. The agent maintains an anti-pattern list that grows over time, so the same mistake is rarely repeated.