Draft with AI.
Owned by people.
The pipeline narrates itself through a Teams thread. The scientists are always in the loop. Open QC to see the trust architecture.
At ~150 reports a year and ~4 hours each to draft, this tool saves >500 hours/year of scientist time, allowing the team to focus on the science. Keeping the human in the loop drives ownership and maintains ISO compliance.
Scientist kicks off interaction · bot fetches metadata · manual fallback
After wrapping up testing, the scientist starts the interaction, giving the LLM relevant project data. Graceful degradation - if the lab API is down, the bot pivots to manual Q&A. The IntakePackage shape stays the same; the rest of the pipeline never knows the difference.
Reads Excel + PDFs · stats · tables · charts · images
Statistical analysis - pure pandas + matplotlib. Every metric
carries a provenance string -
"Results, col Rating, rows 2–10" - so
downstream stages can cite where each number came from.
Photos & micrographs are processed - creating consistent
image sizing, naming & captions.
A conditional state - only when warnings exist
If Analysis emitted warnings, the bot
pauses and asks the scientist to resolve. Cheapest possible
human-in-the-loop - the LLM never encounters bad data.
Scientist reviews data, confirms graphs, tables and charts align
with reality and testing narrative.
Claude Sonnet · per-section · temp 0.1
Temperature 0.1 - not 0, not higher. Empirically tuned: at 0 the LLM repeats phrases across sections (false QC failures); above 0.1 hallucinated numbers slip past. The setting that makes QC work.
Don't trust your own LLM.
Prove the work before a human reads it.
Regex grounding
Every decimal in the draft must appear in the known-values set derived from the AnalysisPackage. If a number was never measured, it was never measured - the draft fails.
Claude Haiku review
Reads draft and analysis side-by-side. Checks accuracy, completeness, and that the pass/fail verdict is stated in the conclusions. Emits a structured issue list on failure.
On fail, issues become hints in the next drafting prompt. Max 2 retries before issues are surfaced to the author.
Hard gate - nothing publishes without explicit approval
The QC-passed draft renders in Teams. The scientist reads it, edits inline if needed, types APPROVE. In a regulated lab environment, the author's signature is the legal artifact - and this state preserves it.
SHA-256 template verification · content-control fill · upload
The DraftPackage is poured into a Word template's content-control XML. The template's SHA-256 is verified first - if anyone modified the template without updating the YAML config, the stage refuses to run.
Terminal - audit trail closed · second human review
The state machine row stays in Table Storage with the full
transition history: every state, every retry, every user message.
Replays reproduce the run exactly.
Second human reviews report for accuracy, releases to customer.
A new report type is a YAML file and a Word template.
Not a feature branch.
Each report type declares its sections, its analysis recipe, its QC rules, its template. The engine reads them at runtime - the pipeline scales by configuration, not by code.
report_type: wet_patch_chemical
sections:
- objective # required
- materials_methods # required
- results_discussion # required
- conclusions # required + pass/fail
- photos # optional
analysis_recipe:
sheet: "Results"
metrics:
- name: rating
stats: [mean, std]
pass_criteria: ">= 7"
provenance: "col Rating, rows 2-10"
qc_rules:
ground_numbers: true
verdict_required: true
cite_standards: false # phase 2
template:
path: "Templates/wet_patch.docx"
sha256: "..."