Codegen-to-Documentation Pipeline#

I developed a pipeline that converts raw UI codegen (captured from screen interactions) into working end-to-end test files, then transforms those test files into full user-facing documentation (HTML and PDF) with minimal manual intervention. What follows is a generalized description of the methodology, architecture, and results.

Problem#

The core question: how can one person keep up with the testing and documentation needs for an organization’s software, while producing both HTML and PDF output simultaneously, reusing as much as possible, and keeping the whole system reproducible?

Approach: JSON Configs as a Compression Layer#

Re-reading full documentation every time a test file is created is wasteful. The solution was JSON config files: a compact list of every function in the repository with a one-to-two-line definition of what it does. This lets the agent convert codegen into working test files without error and without re-reading extensive documentation each time.

{
  "functions": {
    "clickSubmitButton": "Clicks the primary submit button on the current form",
    "waitForConfirmationModal": "Waits for the confirmation dialog to appear",
    "verifySuccessMessage": "Asserts the success toast notification is visible",
    "navigateToSection": "Navigates to a named section via the sidebar menu",
    "fillFormField": "Enters a value into a form field by label text"
  }
}

The same config-driven approach extends to documentation content (column definitions, status types, form fields). One config update propagates to tests and docs simultaneously, keeping everything in sync and easy for both humans and automated tools to work with.

Pipeline Overview#

Each phase saves a checkpoint, so if something fails partway through, earlier phases don’t need to re-run. This makes the pipeline resumable and keeps iteration fast.

┌───────────────────────────┐
│  Capture & Normalize      │
│  Raw codegen → cleaned    │
│  input via rule sets      │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│  Test Generation          │
│  Agent + JSON config →   │
│  structured test files    │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│  Execution & Auto-Fix     │
│  Run tests, fix failures, │
│  capture screenshots      │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│  Documentation Generation │
│  Scaffold from tests →    │
│  multi-pass content fill  │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│  Quality Assurance        │
│  Completeness, style,     │
│  human review, PDF layout │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│  Final Output             │
│  HTML + PDF               │
└───────────────────────────┘

Pipeline Phases#

Capture & Normalize

A programmatic cleanup removes extraneous values (dynamic IDs, debug calls, unstable selectors, duplicates) using rule sets before the agent processes anything. The normalizer also detects repeated interaction patterns and collapses them into helper function calls, e.g. clicking through 15 column headers becomes checkColumnHeaders().

Test Generation & Refinement

The agent uses the JSON config to convert cleaned codegen into structured test files, then audits them for readability via style guide enforcement:

// BEFORE: Hard to read, unclear intent
await page.locator('[data-testid="btn-submit"]').click();
await page.waitForSelector('.modal');
await page.locator('.modal >> text=Confirm').click();
await expect(page.locator('.success')).toBeVisible();
// AFTER: Reads like instructions, clear intent
await clickSubmitButton();
await waitForConfirmationModal();
await confirmAction();
await verifySuccessMessage();

Once the first manual’s tests are finished, the agent converts them to reusable templates. Tests for remaining manuals are generated from the template, only needing section names.

Execution & Auto-Fix

tests run against the live application and enter an auto-fix loop on failure: adjust waits for timeouts, try alternative selectors, retry with backoff. The loop runs up to five iterations before escalating to a human. (Claude can now access the running web application directly, identify what went wrong, and attempt to correct the test case itself.)

Real-world results across 376 test files for a multi-role application with four user manuals:

Metric

Value

Passed on first run

288 (76.6%)

Auto-fixed by agent

70 (18.6%)

Escalated to human

18 (4.8%)

Total pipeline time

~15 hours 43 minutes

95.2% resolved without human intervention. The 18 escalated files involved application-specific state the agent couldn’t replicate without additional context.

Documentation Generation

A scaffolding script creates skeleton .md files from test files and captured screenshots, with placeholders for config-driven content. Screenshots are inserted only at meaningful moments (navigation, form submission, state changes), not every click. (This step has since been augmented with plugins that allow the agent to view the running application directly and capture its own screenshots.)

The agent populates scaffolds using a multi-pass approach, each pass pulling in different configuration data (column definitions, status types, form fields), layering detail until complete.

Quality Assurance

Automated verification: The agent checks generated docs against codegen summaries and test files for completeness and accuracy, then audits against the style guide and an accumulated anti-pattern list.

Human review: I flag anything incorrect. The agent corrects it and adds the pattern to its anti-pattern list. Accuracy is typically 1 to 2 lines of difference across a few docs out of hundreds.

PDF quality scoring: A custom scoring algorithm produces correctly formatted PDF output on the first pass.

Quality Gates#

Gate

Focus

First Passthrough

Does the generated doc match what the codegen describes?

Multi-Pass Audit

Side-by-side verification, style conformance, known-issue detection

Human Review

Final accuracy check, layout and formatting quality

Next Steps#

The next steps are to perform regression testing and statistical modeling to determine the true percentage increase in productivity for the pipeline.