Flagship applied AI system

ClaimFlow AI

Governed agentic AI workflow for motor-insurance claims.

ClaimFlow AI turns unstructured claim PDFs and emails into structured, policy-grounded, human-reviewed cases. It combines extraction, deterministic validation, clause-level policy RAG, guarded agent tools, workflow memory, evals, and run-level observability so the AI can assist without owning the final claim decision.

Product problem

Claim processing is risky to automate directly because claims are incomplete, policy-dependent, and require accountability. A basic LLM can extract text or generate answers, but it cannot safely own missing-field validation, policy evidence, workflow routing, or final decisions.

What I built

ClaimFlow AI turns the claim into a stateful workflow. It extracts structured data, validates missing information, retrieves policy clauses, proposes one safe next action through a guarded agent, surfaces safe memory guidance, and keeps the final decision in human review.

Architecture

System path

The system is built as a monorepo with a Next.js product UI and modular packages for extraction, validation, RAG, agent workflow, memory, database, gateway, shared schemas, and evals. Postgres stores claim state, policy chunks, review tasks, memory records, AI call logs, and eval results. pgvector powers policy retrieval.

Claim intake

→

Extraction run

→

Deterministic validation

→

Policy RAG

→

Workflow memory

→

Guarded agent action

→

Human review

→

Trace and eval

Visual proof

Screenshots and architecture from the source repos

These assets are pulled from the project repositories so the case studies show the actual product workflow, architecture diagrams, dashboards, and runtime proof instead of placeholder cards.

Architecture

End-to-end architecture

Extraction, validation, RAG, memory, guarded agent action, human review, traces, and evals as one workflow.

Workflow proof

Claim intake

The reviewer starts from a claim source and creates a durable extraction run.

Workflow proof

Policy RAG evidence

Coverage reasoning is grounded in retrieved policy clauses and citation support.

Workflow proof

Workflow memory

Memory provides safe workflow guidance without copying old claim facts.

Workflow proof

Guarded tool call

The model proposes one bounded action while backend guardrails control execution.

Product UI

Run trace dashboard

The trace reconstructs the claim from intake through review and outcome.

Product workflow

How the product actually runs

01Reviewer submits a claim PDF or email
02System creates a durable extraction run
03AI extracts schema-shaped claim JSON
04Deterministic validation checks missing fields, evidence, conflicts, warnings, and confidence
05Policy RAG retrieves claim-aware clause evidence
06Workflow memory retrieves safe prior guidance
07The guarded agent proposes exactly one next action
08Backend guardrails allow or block execution
09Human reviewer approves, edits, rejects, or requests more information
10Trace and eval dashboards make the run inspectable

Why it is not trivial

The hard parts are system boundaries

Connects extraction, validation, RAG, agent tools, memory, review, evals, and observability into one workflow.
Uses policy-grounded retrieval instead of unsupported claim decisions.
Separates AI assistance from final approval through human review.

ClaimFlow AI proves that LLMs can assist high-stakes workflows only when surrounded by deterministic validation, policy evidence, bounded tools, memory safety, human review, evaluations, and observability.

Subsystem deep dive

Why this exists

The project exists to prove that applied AI can be designed as a reliable workflow rather than a set of disconnected demos. It answers how extraction, validation, RAG, an agent, memory, human review, evaluation, observability, and governance work together around one claim.

Intake creates durable source data.
Validation creates the workflow boundary.
RAG supplies current policy evidence.
Memory supplies safe historical workflow guidance.
Human review owns approval or rejection.

RAG architecture

Coverage reasoning is grounded in policy evidence. The system loads a synthetic policy corpus, parses it into clause-level chunks, embeds them, stores them in Postgres with pgvector, retrieves relevant clauses, verifies citation support, and persists the coverage question.

Clause-level chunks make policy references auditable.
Claim-aware query planning expands generic questions with claim context.
Weak retrieval or unsupported citations force the answer back to review.

Agent architecture

The agent is not an open-ended autonomous loop. It is a controlled single-step tool-calling workflow. Backend state is loaded first, deterministic routing handles obvious states, and the model can propose exactly one registered tool call.

The model proposes; guardrails decide; backend tools execute.
Unsafe actions such as approve, reject, send email, delete, or bypass review are blocked.
Every proposal, guardrail decision, and execution result is logged.

Memory architecture

Memory stores reusable workflow lessons from trusted outcomes. It can guide caution or routing, but it is not claim evidence and cannot copy old claim facts into the current claim.

WorkflowMemory stores reusable memory cards.
MemoryHit records why memory matched a run.
MemoryUpdate tracks creation, strengthening, weakening, retirement, and supersession.

Evaluation and observability

The AI gateway records model-backed calls with trace IDs, model metadata, prompt/schema versions, latency, token usage, cost, status, and failure classes. Evals test behavior across extraction, validation, review, RAG, agent guardrails, memory, and gateway observability.

Trace dashboards reconstruct one claim from intake to outcome.
Eval dashboards test controlled success, failure, refusal, guardrail, and observability scenarios.
Gateway logs turn model calls into auditable workflow events.

Data model

Durable state behind the UI

Document preserves the original uploaded claim source.
ExtractionRun stores run status, model metadata, raw output, extracted JSON, validation JSON, and related workflow records.
ExtractionEvent records chronological audit history for trace reconstruction.
ReviewTask and ReviewDecision represent human review status and final reviewer decisions.
PolicyDocument and PolicyChunk store the RAG corpus, clause IDs, embeddings, and citation metadata.
AgentActionLog stores proposed, blocked, executed, or failed agent actions.
WorkflowMemory, MemoryHit, and MemoryUpdate store memory cards, retrieval audit, and lifecycle changes.
AiCallLog, EvalRun, and EvalCaseResult store model governance and evaluation evidence.

Engineering decisions

Tradeoffs and reliability boundaries

The AI does not own final claim decisions.
Validation is deterministic and happens before downstream trust.
RAG uses clause-level chunks and citation verification.
Agent workflow is one-step and guarded.
Tools are registered and backend-owned.
Memory is workflow context, not claim evidence.
Memory has lifecycle controls.
Gateway logs every model-backed call.
Evals are part of the product proof.

What makes it more than a demo

Durable workflow state instead of temporary chat output.
Policy-grounded RAG with refusal behavior.
Registered backend tools and guardrails.
Human-owned decisions.
Workflow memory with safe-use rules.
Per-run trace dashboard and evaluation dashboard.
Connected safety boundaries across one claim lifecycle.

Next improvements

Add organization-grade authentication and roles.
Add richer document ingestion and policy versioning.
Expand hybrid retrieval, reranking, and corpus coverage.
Add deeper reviewer analytics and production alerting.

Proof links

GitHub repo Live product Agentic workflow demo Memory layer demo Engineering journal

Proof links connect to the source repos, demos, architecture assets, screenshots, and engineering journals used to build this case study.

Final takeaway