AI architecture
ADMINISTRATOR ::: danger RestrictedInternal architecture documentation. Do not paste outside the admin section. :::
RAPAX PMS uses five AI providers in a routed swarm. No single model is the source of truth; every model output is one piece of evidence against the deterministic Master List.
Provider matrix
| Provider | Model | Where it runs | Primary jobs | Budget guard |
|---|---|---|---|---|
| Workers AI (Cloudflare) | @cf/meta/llama-4-scout-17b-16e-instruct | Worker-local | Filename classifier · magic-byte sniff · short prompts · cheap fallback | None — Worker-attached |
| Anthropic Claude | claude-sonnet-4-6 (chat), claude-opus-4-7 (long-context audit) | External API | Component-card AI populate · code-advisor · long-context Knowledge Query audit | withBudgetGuard — daily $ cap, 429 on overrun |
| Google Gemini | gemini-2.5-pro (or fallback) | External API | PDF-native extraction · multimodal source-document parsing | AbortSignal.timeout(120s) for PDFs |
| Perplexity | sonar-pro | External API | Web-grounded model lookup · maker/model verification | Per-call timeout |
| Kimi (Moonshot) | kimi-k2.6 (NOT -thinking) | External API | CL Remap orchestration · Tier 1 / Tier 2 batched JSON extraction | Tier 1: 120 batch / parallel 5; Tier 2: 60 batch / parallel 3 |
Kimi constraints (hard-won)
- Model name always
kimi-k2.6—-thinkingreturns 404 - Temperature must be exactly
0.6(no thinking) or1.0(thinking) — anything else returns 400 - Tier 1
max_tokens32768; Tier 2max_tokens65536 - D1 REST
/queryexpects{batch: [...]}wrapper, not a raw array - Strict JSON schema is advisory — the parser must tolerate missing required arrays and
max_tokenstruncation
Router
src/ai-router.js exposes routeAi(jobKind, payload) which:
- Picks the provider by
jobKindand current health (/api/ai-statusexposes per-provider running / idle / failed counts) - Wraps the call in
Promise.raceagainst asetTimeoutreject (Workers AI binding can't take anAbortSignal) - Records
ai_callsrows withprovider,model,latency_ms,tokens_in,tokens_out,cost_usd, and anytimeout: <Nms>marker so we can distinguish timeouts from external 5xx - Emits
ai-statusmessenger threads on:- heal completion (admin-triggered, no dedupe)
- wizard supersede event (Sofia 23-08 quiet-hours guard)
- RAG eval recall@5 ≥5pp regression (sentinel-deduped per UTC day)
Chains
src/ai-chains.js defines the named multi-step chains. The most relevant ones for ops:
- Chain B — extractor pipeline. Reads source-doc content → calls Claude / Gemini → emits structured fields →
provenance-tracker.recordProvenance(...)writes(vessel_id, field_path, source_document_id, provenance_quality)rows - Chain C — deep extract for vessel-document chunks. Backed by Cloudflare Queue
chain-c-steps(DLQchain-c-steps-dlq) so long-running chunks don't block the request - Knowledge-query chain —
retrieveHybrid → auditLongContext (Kimi K2.6, free tier) → validateCitations → corrective-retry. The audit step usesparseModelJson()with three strategies (fenced block · pure JSON · balanced-brace scan) so hallucinatedsource_document_ids inside fenced JSON cannot bypassvalidateCitations
RAG retrieval
Two indexes, both consulted on every query:
| Index | Backing store | Used for |
|---|---|---|
| Dense | Cloudflare Vectorize (binding VEC, model EMBEDDING_MODEL) | Semantic similarity over rag_chunks |
| Sparse | D1 FTS5 on rag_chunks | Lexical match for codes, names, makers, model numbers |
retrieveHybrid({ vesselId, query, mandatoryOnly?, classFilter? }):
mandatoryOnly: truefiltersrag_chunks.mandatory_class IS NOT NULL— i.e. only chunks from one of the 6 mandatory blockersclassFilter: ['particulars', 'capacity_plan', ...]restricts to a named subset- Dense and sparse hits merged with reciprocal-rank fusion
winner_pathtrace tag attached to each result:master_fuzzy | legacy_inferUcs | kb-corrected | rag:mandatory | rag:helpful | keyword-rule | fewshot
Provenance authority order
- Active Master List (deterministic, code-perfect)
- Source-document evidence (
vessel_particulars_provenance,provenance_quality='extracted') - Manual override (
provenance_quality='manual_override') - RAG retrieval (advisory only, never authoritative)
- KB correction (
cl_knowledge_base, post-AI rewrite layer)
Knowledge Base (cl_knowledge_base)
The KB is corrective only — it adjusts what the LLM will output next time, never what is in the live PMS state. Rules:
- Read-only from the LLM — the LLM never writes to it; supervisors and administrators do
- Quarantine —
quarantined=1for orphan codes (target not inucs_master_list). The KB matcher filtersWHERE quarantined=0 - Heal —
kb-orphan-heal.jsre-maps quarantined rows against the active Master at Jaccard ≥ 0.72. Two SQL fixes in v2.31.0.20:component_name LIKE(the column was renamed fromname) andversion_id IN (SELECT id FROM ucs_foundation_versions WHERE is_active=1) - Self-learning hook —
src/self-learning.js:93-130mirrors non-REJECTED corrections intocl_knowledge_base(seeded_by='auto',learning_weight=1.0) with admin-priority guard (NOT EXISTSclause prevents overwritingseeded_by='admin'rows)
Budget guard
src/budget-guard.js enforces a daily $ cap per provider. On overrun:
- Returns HTTP 429 with
Retry-After: <seconds-until-UTC-midnight> - Emits an
ai-statusmessenger thread (sentinel-deduped per UTC day so we don't spam at every overrun) - Per-component
auto-imageendpoint additionally uses a module-levelautoImageInFlight Setto return 429 on overlapping calls
Swarm overlay
client/src/components/swarm-overlay.tsx polls /api/ai-status and renders a draggable, collapsible overlay with one tile per provider showing running / idle / failed state. State persistence:
swarm_overlay_openinlocalStoragecontrols open/closedcollapsedis React-only by design (resets on reload) — collapsed state shrinks to a 220px rounded-full pill, expanded restores the 300px rounded-2xl panel
RAG eval cron
Runs at 02:00 UTC. Compares current recall@5 against rolling 7-day baseline. On ≥5pp regression:
- Sends to
EMAIL_DISPATCH_QUEUE(Postmark) - Posts to
ai-statusmessenger thread (sentinel-deduped per UTC day; falls through to "notify anyway" if sentinel SELECT errors)
Where to look when something is wrong
| Symptom | Where to look first |
|---|---|
/api/health says version mismatch | client/src/App.tsx:216 sidebar span vs /api/health.version literal in src/index.js |
KB orphan heal returns errorSamples[] | src/kb-orphan-heal.js — likely a Master schema drift (column rename), see v2.31.0.20 |
| Wizard upload returns 500 | Check [wizard-update] / [wizard-insert] log lines for raw D1 error |
| RAG retrieval returns empty | Check mandatory_class backfill state — POST /api/admin/backfill/rag-chunks-mandatory-class |
| AI swarm shows all-failed | /api/ai-status for per-provider state; check budget guard for cap-hit |
| Kimi remap fails on batch N | audit-notes/kimi-run-<vessel>.json snapshot — orchestrator parser is not tolerant to max_tokens truncation as of v2.31.0.35 (known bug) |