ADR-0033: Unified Entity State Model¶

Status: Proposed Date: 2026-05-26 Supersedes: ADR-0017 §"Status vocabulary" (partial), ADR-0025 §"Expanded vocabulary" (full) Extends: ADR-0024, ADR-0028, ADR-0029 Related: ADR-0030, ADR-0034, ADR-0039 Depends on: ADR-0009 (current persistence implementation)

Context¶

The unifying principle these ADRs serve is the evidence chain: every state has reasons, every reason has evidence, every claim has evidence. Auditability is not an export feature — it is the data model. This ADR establishes the state-side of the chain (entity status with structured reasons); ADR-0039 establishes the knowledge-side (learned facts with structured evidence); both share EvidenceRef (defined here).

Status semantics are currently spread across five ADRs, each adding a dimension without unifying the model:

ADR	Contribution	Problem
ADR-0017	Session lifecycle: `running`, `completed`, `failed`, `aborted`	Too coarse — no timeout, no cancel distinction
ADR-0025	Expanded vocabulary: adds `timed_out`, `cancelled`	Solves coarseness, but applies only to sessions
ADR-0024	Health classification: `healthy` → `zombie` enum	Separate axis, computed per-read, not persisted
ADR-0028	Reason codes: structured "why"	Right primitive, but defined only for sessions
ADR-0029	Artifact contract: delivery verification	No integration with status/health model

The consequence is visible in the UI today:

Issue #1176: "Failed + Healthy" badge stack. These are two correct statements from two separate axes displayed as if they were one status. Users read it as contradictory.
Issue #1162: Dashboard "Stale: 0" contradicts "1 stuck >60m" because two different detectors evaluate staleness with different thresholds and different entity scopes.
Issue #1161: Show status reads "Active" from SQLite while the detail page reads "Merged" from _show.md. Two sources, no reconciliation, no single model.

The Attention Queue (ADR-0030) cannot be built correctly without a unified state model. It needs to sort items by severity across entity types — runs, shows, plays, schedules, teams. If each entity type computes severity differently, the queue is incoherent.

Decision¶

Separate entity state into three orthogonal dimensions¶

Every operational entity (run/session, show, play, invocation, schedule, team) carries a normalized state composed of three independent fields plus a reason chain:

lifecycle_status × process_health × delivery_state → severity
                                                    + reason_code[]
                                                    + evidence_ref[]

These dimensions answer different operator questions:

Dimension	Question	Example values
`lifecycle_status`	Did the work finish? What happened?	`running`, `completed`, `failed`, `timed_out`, `cancelled`, `aborted`
`process_health`	Is the runtime/process okay?	`ok`, `running`, `idle`, `stalled`, `process_dead`, `orphaned`
`delivery_state`	Did it produce required outputs?	`passed`, `partial`, `missing`, `invalid`, `not_expected`

severity and tone are derived, never stored:

Field	Purpose	Values
`severity`	Determines placement and attention priority	`critical`, `warning`, `info`, `neutral`
`tone`	Determines badge/indicator color	`danger`, `warning`, `info`, `success`, `neutral`

NormalizedState schema¶

@dataclass
class NormalizedState:
    lifecycle: str

    outcome: str | None = None
    # succeeded | completed | merged | failed | timed_out
    # | cancelled | aborted | skipped | unknown

    health: str | None = None
    # ok | running | idle | degraded | stalled
    # | process_dead | orphaned | disconnected | unknown

    delivery: str | None = None
    # passed | partial | missing | invalid | not_expected | unknown

    severity: str = "neutral"
    # critical | warning | info | neutral

    tone: str = "neutral"
    # danger | warning | info | success | neutral

    reasons: list[StateReason] = field(default_factory=list)

    evaluated_at: float = 0.0
    policy_version: str = "v1"
    source: str = "backend"
    # backend | frontend_compat


@dataclass
class StateReason:
    code: str               # structured: "run.failed.exit_nonzero"
    message: str            # human: "Exit code 124 from python-tests"
    claim_status: str = "observed"
    # observed | inferred | hypothesis | verified | disputed | superseded
    confidence: float = 1.0
    # Confidence in this REASON explaining the state.
    # Distinct from Claim.confidence in ADR-0039, which measures
    # confidence in a learned fact being true.
    entity_type: str | None = None
    entity_id: str | None = None
    evidence: list[EvidenceRef] = field(default_factory=list)


@dataclass
class EvidenceRef:
    kind: str
    # Allowed kinds:
    #   message        — chat message in a session
    #   user_statement — explicit user assertion
    #   tool_result    — output from a tool call
    #   artifact       — produced file/report
    #   url            — web resource (include fetched_at, content_hash if available)
    #   file           — local file (include repo, commit_sha if available)
    #   model_inference — agent reasoning step
    #   human_assertion — human verified externally

    id: str | None = None
    session_id: str | None = None
    message_id: str | None = None
    tool_call_id: str | None = None
    artifact_id: str | None = None
    path: str | None = None
    url: str | None = None
    repo: str | None = None
    commit_sha: str | None = None
    content_hash: str | None = None
    fetched_at: float | None = None
    detail: str | None = None
    # Kind-specific descriptor:
    #   message         → relevant quote
    #   tool_result     → result summary
    #   model_inference → reasoning rationale
    #   user_statement  → paraphrased assertion
    #   artifact        → contract violation detail or note
    #   human_assertion → assertion text

Severity derivation heuristic¶

Severity is computed from the three dimensions using a deterministic priority cascade. The most severe condition wins:

def derive_severity(state: NormalizedState) -> tuple[str, str]:
    """Returns (severity, tone)."""

    # Critical conditions
    if state.outcome in ("failed", "aborted"):
        return ("critical", "danger")
    if state.health in ("process_dead", "orphaned"):
        return ("critical", "danger")
    if state.delivery == "missing":
        return ("critical", "danger")
    if state.health == "stalled":
        return ("critical", "danger")
    if state.health == "misfired":
        return ("critical", "danger")

    # Warning conditions
    if state.outcome == "timed_out":
        return ("warning", "warning")
    if state.health in ("idle", "degraded", "disconnected"):
        return ("warning", "warning")
    if state.delivery in ("partial", "invalid"):
        return ("warning", "warning")

    # Info conditions
    if state.health == "running":
        return ("info", "info")
    if state.health == "due":
        return ("info", "info")
    if state.outcome == "skipped":
        return ("info", "neutral")

    # Success conditions
    if state.outcome in ("succeeded", "completed", "merged"):
        return ("neutral", "success")

    # Cancelled is intentional, not a problem
    if state.outcome == "cancelled":
        return ("neutral", "neutral")

    return ("neutral", "neutral")

The relationship between severity (4 values) and tone (5 values):

severity	possible tones	when
critical	danger	failures, dead processes, missing artifacts, stalls
warning	warning	timeouts, partial delivery, idle/degraded/disconnected
info	info, neutral	running, skipped
neutral	success, neutral	succeeded/completed/merged, cancelled, unknown

Severity drives placement (attention queue inclusion, sort order, row border). Tone drives visual treatment (badge color). Both are pure functions of the three operational dimensions.

Claim severity (knowledge, not operations)¶

Knowledge claims have their own severity derivation, defined in ADR-0039. It is a parallel function, not unified with derive_severity() above, because claims are not operational entities — their severity is about knowledge health (disputed, low-confidence, superseded), not work health.

Both severity functions emit into the same Attention Queue (ADR-0030) using identical severity and tone values.

Per-entity lifecycle vocabularies¶

Each entity type defines its allowed lifecycle states. The outcome field uses the entity-specific terminal status:

Runs / Sessions¶

Lifecycle: pending → running → completed | failed | timed_out | cancelled | aborted
Health:    ok | running | idle | stalled | process_dead | orphaned
Delivery:  passed | partial | missing | not_expected

Extends ADR-0025's six-value vocabulary unchanged. ADR-0024's health classification maps directly to the health field.

Shows¶

Lifecycle: draft → planned → active → completed | merged | failed | archived
Health:    ok | running | blocked
Delivery:  passed | partial | missing | not_expected

Plays¶

Lifecycle: planned → queued → running → run_complete → merged | failed | blocked | timed_out | cancelled | skipped
Health:    ok | running | stalled
Delivery:  passed | partial | missing | not_expected

Invocations¶

Lifecycle: pending → running → succeeded | failed | timed_out | cancelled | skipped
Health:    (not applicable — invocations are short-lived)
Delivery:  (not applicable)

Schedules¶

Lifecycle: enabled → paused → disabled
Health:    ok | due | running | misfired
Delivery:  (last_run_outcome carries delivery state)

Teams¶

Lifecycle: active → idle → blocked → closed
Health:    ok | orphaned
Delivery:  (not applicable)

State transition validity¶

Lifecycle transitions are NOT arbitrary. Each entity type has a state machine; the backend rejects invalid transitions. This addresses issues #1162, #1171, #1172, #1176 — sessions stuck in null/non-terminal states because nothing enforced terminal transitions.

Enforcement points:

Write-side: Any code path writing a lifecycle value MUST go through state.transition(entity_type, entity_id, new_lifecycle, reason). Direct UPDATE on status columns is forbidden (enforced by code review, not SQL).
Terminal enforcement: When a process exits or a watchdog detects death, the corresponding lifecycle MUST land in a terminal value within the entity's vocabulary. Sessions that exit without a terminal status are auto-transitioned to aborted with reason system.health.process_dead_no_terminal.
Phantom reconciliation: A reaper job (per ADR-0024) detects entities with non-terminal lifecycle AND health in (process_dead, orphaned) AND age > threshold. These are auto-transitioned to aborted with appropriate reason.

Per-entity transition graphs: Live in lionagi/state/transitions/ as one module per entity type. Each module exports a VALID_TRANSITIONS: dict[str, set[str]] mapping current → allowed nexts. The protocol does not specify them inline because they evolve with operational learning; the modules are version-controlled and tested.

Reason code namespace¶

Reason codes use a hierarchical dot-separated namespace:

{entity_type}.{dimension}.{cause}

Examples:

Code	Meaning
`run.failed.exit_nonzero`	Process exited with non-zero code
`run.failed.artifact_contract`	Required artifact not produced (ADR-0029)
`run.health.process_dead`	PID check failed, process no longer running
`run.health.stalled`	No message activity beyond threshold
`run.delivery.missing`	Expected output files not found
`show.failed.critical_path_blocked`	A critical-path play failed
`play.blocked.dependency_failed`	Upstream play in `depends_on` failed
`play.blocked.dependency_invalid`	Upstream play name in `depends_on` doesn't exist
`schedule.health.misfired`	Scheduled fire time passed without execution
`team.health.orphaned`	Team's parent show/play is terminal but team is still active
`knowledge.disputed.conflicting_evidence`	Claim has evidence refs supporting contradictory positions
`knowledge.disputed.user_rejection`	Human operator marked claim disputed
`knowledge.superseded.newer_observation`	Newer claim with stronger evidence replaced this one
`knowledge.stale.unverified_too_long`	Claim aged past verification budget without confirmation

Authoritative registry: The complete reason-code namespace lives in lionagi/state/reason_codes.py as a Python module (one constant per code with docstring). The table above is illustrative; the module is canonical. Any code emitted at runtime MUST appear in the registry — this is enforced by a unit test.

New codes can be added without schema changes. The namespace convention is enforced by validation, not by SQL CHECK.

Backend owns state evaluation¶

The backend computes NormalizedState for every entity and includes it in API responses. The frontend renders it. This is the permanent architecture:

Backend evaluates operational truth.
Frontend renders it.

During migration, the frontend MAY contain a compatibility derivation layer (compat_derive_* functions) that fills in NormalizedState when the backend hasn't been updated yet. These functions MUST set source = "frontend_compat" so the transition is traceable.

State display contract¶

The UI renders compound state as a flat chain, not stacked badges:

Failed · Infra OK · Trace present
Running · Stalled · Artifacts missing
Completed · Artifact contract passed
Timed out · No review.md produced

The first segment is always outcome. The second is health (omitted if ok and outcome is terminal). The third is delivery (omitted if not_expected or passed and outcome is terminal success).

Severity determines:

Row left-border color
Attention queue inclusion
Sort priority in tables
Primary icon in badges

Tone determines:

Badge background/text color

Consequences¶

Positive

Single state model across all entity types — tables, dashboard, attention queue, and detail pages all render the same NormalizedState.
"Failed + Healthy" becomes "Failed · Infra OK" — explicit and non-contradictory.
Stale/stuck disagreement resolved: one heuristic, one threshold config, one result.
Reason codes enable failure clustering (ADR-0030), queryable cause analysis, and structured alerting.
Evidence refs connect state to proof — auditable.
Frontend compatibility layer makes migration incremental.

Negative

Every API response grows by ~200 bytes per entity (the state object).
Backend must compute health and delivery on every read until indexed materialized state is implemented.
Existing ADRs (0017, 0024, 0025, 0028) are partially superseded — increases the ADR cross-reference burden.

Migration¶

This ADR partially supersedes ADR-0017 and ADR-0025, and extends ADR-0024, ADR-0028, ADR-0029. Migration order:

Backend computes NormalizedState for all entity reads. The three dimensions (lifecycle, health, delivery) source from existing columns or are computed inline. No DB migration required for read path.
Frontend compat derivation lands as compat_derive_normalized_state(entity) functions, called when backend response lacks NormalizedState. All compat-derived states set source = "frontend_compat".
Backend persistence migration: ALTER TABLE adds health/delivery/reason_codes columns. Backfill computes from existing data. New writes use the new columns.
Frontend switches to backend-only rendering. Compat layer removed entity-by-entity as backend coverage verified.
Legacy single-status field deprecated. Reads still served for one release. Then removed.

The frontend compat layer is the migration bridge. Both reads (backend has it / backend doesn't) are valid during the transition; the source field makes it traceable.

Alternatives Considered¶

Alternative	Why Rejected
Keep dimensions separate (status + health as independent API fields)	Frontend still has to compose them; no single severity sort; "Failed + Healthy" persists
Frontend-only derivation (no backend state computation)	Two browser tabs can derive different states from stale caches; no single source of truth for attention queue
Store full NormalizedState as JSON column	Not queryable by individual fields; can't index on health or delivery
Single flat status enum with 30+ values	Combinatorial explosion; can't independently query "all failed regardless of health"

References¶

ADR-0017 — Session Lifecycle and Status Derivation (partially superseded)
ADR-0024 — Session Health Classification and Admin Surface (extended)
ADR-0025 — Session Status Vocabulary (fully superseded)
ADR-0028 — Status Reason Model (extended)
ADR-0029 — Artifact Contract (extended)
ADR-0030 — Attention Queue (consumes severity)
ADR-0034 — Frontend Data & State Architecture
ADR-0039 — Knowledge Substrate (parallel claim model)
Issue #1161: Show status stuck at "Active"
Issue #1162: Dashboard stale/stuck contradiction
Issue #1171: Terminal status enforcement
Issue #1172: Phantom session reaper
Issue #1176: "Failed + Healthy" contradictory badges

Appendix A: Current SQLite Implementation¶

This appendix documents the current persistence layer (ADR-0009). The DDL below is one storage implementation of NormalizedState; the contract is NormalizedState itself, not these columns. Future stores (Postgres, distributed) will materialize the same model differently. Treat this appendix as evolving with the data layer.

-- Add to sessions table
ALTER TABLE sessions ADD COLUMN health       TEXT;
ALTER TABLE sessions ADD COLUMN delivery     TEXT;
ALTER TABLE sessions ADD COLUMN reason_codes JSON DEFAULT '[]';

-- Add to plays table
ALTER TABLE plays ADD COLUMN health       TEXT;
ALTER TABLE plays ADD COLUMN delivery     TEXT;
ALTER TABLE plays ADD COLUMN reason_codes JSON DEFAULT '[]';

-- Composite index for attention queries
CREATE INDEX IF NOT EXISTS idx_sessions_severity
    ON sessions(status, health) WHERE status != 'completed';

CREATE INDEX IF NOT EXISTS idx_plays_severity
    ON plays(status, health) WHERE status NOT IN ('merged', 'skipped');

The NormalizedState object itself is computed at query time from the stored fields — not stored as a JSON blob. This keeps the columns queryable and indexable.