← Back to portfolio

Flagship applied AI system

ClaimFlow AI

Governed agentic AI workflow for motor-insurance claims.

ClaimFlow AI turns unstructured claim PDFs and emails into structured, policy-grounded, human-reviewed cases. It combines extraction, deterministic validation, clause-level policy RAG, guarded agent tools, workflow memory, evals, and run-level observability so the AI can assist without owning the final claim decision.

Product problem

Claim processing is risky to automate directly because claims are incomplete, policy-dependent, and require accountability. A basic LLM can extract text or generate answers, but it cannot safely own missing-field validation, policy evidence, workflow routing, or final decisions.

What I built

ClaimFlow AI turns the claim into a stateful workflow. It extracts structured data, validates missing information, retrieves policy clauses, proposes one safe next action through a guarded agent, surfaces safe memory guidance, and keeps the final decision in human review.

Architecture

System path

The system is built as a monorepo with a Next.js product UI and modular packages for extraction, validation, RAG, agent workflow, memory, database, gateway, shared schemas, and evals. Postgres stores claim state, policy chunks, review tasks, memory records, AI call logs, and eval results. pgvector powers policy retrieval.

01

Claim intake

02

Extraction run

03

Deterministic validation

04

Policy RAG

05

Workflow memory

06

Guarded agent action

07

Human review

08

Trace and eval

Visual proof

Screenshots and architecture from the source repos

These assets are pulled from the project repositories so the case studies show the actual product workflow, architecture diagrams, dashboards, and runtime proof instead of placeholder cards.

Architecture

End-to-end architecture

Extraction, validation, RAG, memory, guarded agent action, human review, traces, and evals as one workflow.

End-to-end architecture

Product workflow

How the product actually runs

  1. 01Reviewer submits a claim PDF or email
  2. 02System creates a durable extraction run
  3. 03AI extracts schema-shaped claim JSON
  4. 04Deterministic validation checks missing fields, evidence, conflicts, warnings, and confidence
  5. 05Policy RAG retrieves claim-aware clause evidence
  6. 06Workflow memory retrieves safe prior guidance
  7. 07The guarded agent proposes exactly one next action
  8. 08Backend guardrails allow or block execution
  9. 09Human reviewer approves, edits, rejects, or requests more information
  10. 10Trace and eval dashboards make the run inspectable

Why it is not trivial

The hard parts are system boundaries

  • Connects extraction, validation, RAG, agent tools, memory, review, evals, and observability into one workflow.
  • Uses policy-grounded retrieval instead of unsupported claim decisions.
  • Separates AI assistance from final approval through human review.

ClaimFlow AI proves that LLMs can assist high-stakes workflows only when surrounded by deterministic validation, policy evidence, bounded tools, memory safety, human review, evaluations, and observability.

Subsystem deep dive

Why this exists

The project exists to prove that applied AI can be designed as a reliable workflow rather than a set of disconnected demos. It answers how extraction, validation, RAG, an agent, memory, human review, evaluation, observability, and governance work together around one claim.

  • Intake creates durable source data.
  • Validation creates the workflow boundary.
  • RAG supplies current policy evidence.
  • Memory supplies safe historical workflow guidance.
  • Human review owns approval or rejection.

RAG architecture

Coverage reasoning is grounded in policy evidence. The system loads a synthetic policy corpus, parses it into clause-level chunks, embeds them, stores them in Postgres with pgvector, retrieves relevant clauses, verifies citation support, and persists the coverage question.

  • Clause-level chunks make policy references auditable.
  • Claim-aware query planning expands generic questions with claim context.
  • Weak retrieval or unsupported citations force the answer back to review.

Agent architecture

The agent is not an open-ended autonomous loop. It is a controlled single-step tool-calling workflow. Backend state is loaded first, deterministic routing handles obvious states, and the model can propose exactly one registered tool call.

  • The model proposes; guardrails decide; backend tools execute.
  • Unsafe actions such as approve, reject, send email, delete, or bypass review are blocked.
  • Every proposal, guardrail decision, and execution result is logged.

Memory architecture

Memory stores reusable workflow lessons from trusted outcomes. It can guide caution or routing, but it is not claim evidence and cannot copy old claim facts into the current claim.

  • WorkflowMemory stores reusable memory cards.
  • MemoryHit records why memory matched a run.
  • MemoryUpdate tracks creation, strengthening, weakening, retirement, and supersession.

Evaluation and observability

The AI gateway records model-backed calls with trace IDs, model metadata, prompt/schema versions, latency, token usage, cost, status, and failure classes. Evals test behavior across extraction, validation, review, RAG, agent guardrails, memory, and gateway observability.

  • Trace dashboards reconstruct one claim from intake to outcome.
  • Eval dashboards test controlled success, failure, refusal, guardrail, and observability scenarios.
  • Gateway logs turn model calls into auditable workflow events.

Data model

Durable state behind the UI

  • Document preserves the original uploaded claim source.
  • ExtractionRun stores run status, model metadata, raw output, extracted JSON, validation JSON, and related workflow records.
  • ExtractionEvent records chronological audit history for trace reconstruction.
  • ReviewTask and ReviewDecision represent human review status and final reviewer decisions.
  • PolicyDocument and PolicyChunk store the RAG corpus, clause IDs, embeddings, and citation metadata.
  • AgentActionLog stores proposed, blocked, executed, or failed agent actions.
  • WorkflowMemory, MemoryHit, and MemoryUpdate store memory cards, retrieval audit, and lifecycle changes.
  • AiCallLog, EvalRun, and EvalCaseResult store model governance and evaluation evidence.

Engineering decisions

Tradeoffs and reliability boundaries

  • The AI does not own final claim decisions.
  • Validation is deterministic and happens before downstream trust.
  • RAG uses clause-level chunks and citation verification.
  • Agent workflow is one-step and guarded.
  • Tools are registered and backend-owned.
  • Memory is workflow context, not claim evidence.
  • Memory has lifecycle controls.
  • Gateway logs every model-backed call.
  • Evals are part of the product proof.

What makes it more than a demo

  • Durable workflow state instead of temporary chat output.
  • Policy-grounded RAG with refusal behavior.
  • Registered backend tools and guardrails.
  • Human-owned decisions.
  • Workflow memory with safe-use rules.
  • Per-run trace dashboard and evaluation dashboard.
  • Connected safety boundaries across one claim lifecycle.

Next improvements

  • Add organization-grade authentication and roles.
  • Add richer document ingestion and policy versioning.
  • Expand hybrid retrieval, reranking, and corpus coverage.
  • Add deeper reviewer analytics and production alerting.

Proof links

Proof links connect to the source repos, demos, architecture assets, screenshots, and engineering journals used to build this case study.

Final takeaway

ClaimFlow AI proves that LLMs can assist high-stakes workflows only when surrounded by deterministic validation, policy evidence, bounded tools, memory safety, human review, evaluations, and observability.