Why this exists
The project exists to prove that applied AI can be designed as a reliable workflow rather than a set of disconnected demos. It answers how extraction, validation, RAG, an agent, memory, human review, evaluation, observability, and governance work together around one claim.
- Intake creates durable source data.
- Validation creates the workflow boundary.
- RAG supplies current policy evidence.
- Memory supplies safe historical workflow guidance.
- Human review owns approval or rejection.
RAG architecture
Coverage reasoning is grounded in policy evidence. The system loads a synthetic policy corpus, parses it into clause-level chunks, embeds them, stores them in Postgres with pgvector, retrieves relevant clauses, verifies citation support, and persists the coverage question.
- Clause-level chunks make policy references auditable.
- Claim-aware query planning expands generic questions with claim context.
- Weak retrieval or unsupported citations force the answer back to review.
Agent architecture
The agent is not an open-ended autonomous loop. It is a controlled single-step tool-calling workflow. Backend state is loaded first, deterministic routing handles obvious states, and the model can propose exactly one registered tool call.
- The model proposes; guardrails decide; backend tools execute.
- Unsafe actions such as approve, reject, send email, delete, or bypass review are blocked.
- Every proposal, guardrail decision, and execution result is logged.
Memory architecture
Memory stores reusable workflow lessons from trusted outcomes. It can guide caution or routing, but it is not claim evidence and cannot copy old claim facts into the current claim.
- WorkflowMemory stores reusable memory cards.
- MemoryHit records why memory matched a run.
- MemoryUpdate tracks creation, strengthening, weakening, retirement, and supersession.
Evaluation and observability
The AI gateway records model-backed calls with trace IDs, model metadata, prompt/schema versions, latency, token usage, cost, status, and failure classes. Evals test behavior across extraction, validation, review, RAG, agent guardrails, memory, and gateway observability.
- Trace dashboards reconstruct one claim from intake to outcome.
- Eval dashboards test controlled success, failure, refusal, guardrail, and observability scenarios.
- Gateway logs turn model calls into auditable workflow events.