# Agent Industry Digests: Data Contract and Ingestion Guide

## Goal

This document explains the industry-digest changes and the expected data shape in DB so another microservice can reliably write digest rows.

## Changes Implemented (24 Feb 2026)

1. New table for industry briefings:
- Create migration: `migrations/versions/f0e1d2c3b4a5_add_agent_digests_table.py`
- Rename migration: `migrations/versions/a7b8c9d0e1f3_rename_agent_digests_to_agent_expertise_digest.py`
- Digest type expansion migration: `migrations/versions/c4d5e6f7a8b9_expand_agent_expertise_digest_types.py`
- Digest type constraint removal migration: `migrations/versions/d6e7f8a9b0c1_drop_agent_expertise_digest_type_constraint.py`
- Current revision for this feature: `d6e7f8a9b0c1`

2. Prompt now consumes these rows:
- Reader: `microservices/agent_q_and_a/context.py` (`_persona_blocks`)
- Filter: `AgentDigest.agent_id = <agent>` and `AgentDigest.is_current = true`
- Supported `digest_type` values:
  - `1_month_industry`
  - `6_month_industry`

3. Prompt rendering behavior:
- Adds an `INDUSTRY NEWS:` section with headers:
  - `[Last 30 days | as of <generated_at> | <source_scope>]`
  - `[Last 6 months | as of <generated_at> | <source_scope>]`

## Table Scope

These tables are intentionally different:

- `persona_news_digests`: daily persona-news context (factual/emotional)
- `agent_expertise_digest`: profile-adapted industry briefing history (30d and 180d windows)

Do not copy data 1:1 between them.

## `agent_expertise_digest` Contract

| Column | Required | Expected values / behavior |
| --- | --- | --- |
| `agent_id` | Yes | FK to `agent.id` |
| `digest_type` | Yes | Free-form digest type label (max 32 chars). Prompt currently consumes `1_month_industry` and `6_month_industry`. |
| `time_window_days` | Yes | `30` for `1_month_*`, `180` for `6_month_*` |
| `channel` | Yes | `industry` (default) or `public` |
| `content` | Yes | Prompt-ready digest text |
| `content_tokens_est` | No | Optional token estimate |
| `generated_at` | Yes | Freshness timestamp for digest content |
| `model_name` / `model_version` | No | Optional model metadata |
| `source_scope` | No (recommended) | Provenance, e.g. `group=636;profile_adapted_industry_digest` |
| `confidence_score` | No | Optional quality score |
| `proto_events` | No | JSON array, default `[]` |
| `citations` | No | JSON array, default `[]` |
| `version` | Yes | Monotonic per `(agent_id, digest_type)` |
| `is_current` | Yes | Exactly one current row per `(agent_id, digest_type)` |
| `created_at` | Yes | DB-generated timestamp |

## Constraints and Indexes

- `ck_agent_expertise_digest_channel`: channel must be `public` or `industry`.
- `idx_agent_expertise_digest_current` (partial unique): one current row per `(agent_id, digest_type)` where `is_current = true`.
- `idx_agent_expertise_digest_history`: history lookup by `(agent_id, digest_type, version)`.
- `idx_agent_expertise_digest_staleness`: staleness scan on current rows by `generated_at`.

## Expected Rows Per Agent

Prompt-coverage expectation for each covered agent:

- 1 current row for `1_month_industry` (`time_window_days = 30`)
- 1 current row for `6_month_industry` (`time_window_days = 180`)

Optional additional rows can be stored for broader context exports:

- `1_month_news` (`time_window_days = 30`)
- `6_month_news` (`time_window_days = 180`)

That means 2 required current rows for prompt coverage, or up to 4 current rows per agent when both industry and news channels are loaded, plus any historical rows (`is_current = false`).

Example current rows for one agent (`agent_id = 387165`):

| agent_id | digest_type | time_window_days | version | is_current | source_scope |
| --- | --- | --- | --- | --- | --- |
| 387165 | `1_month_industry` | 30 | 4 | true | `group=636;profile_adapted_industry_digest` |
| 387165 | `6_month_industry` | 180 | 3 | true | `group=636;profile_adapted_industry_digest` |

## Recommended Write Pattern (Idempotent)

Per `(agent_id, digest_type)` in a transaction:

1. Load current row (`is_current = true`).
2. If content is unchanged, skip write.
3. If changed:
- Set old current row(s) to `is_current = false`.
- Insert new row with `is_current = true` and `version = max(version) + 1`.

This preserves history and avoids partial-unique index conflicts.

## Example Microservice Payload

```json
{
  "agent_id": 387165,
  "digests": [
    {
      "digest_type": "1_month_industry",
      "time_window_days": 30,
      "channel": "industry",
      "content": "PROFILE LENS ... WHAT THIS PERSONA SHOULD WATCH ...",
      "generated_at": "2026-02-24T17:15:00Z",
      "model_name": "gpt-5",
      "model_version": "2026-02",
      "source_scope": "group=636;profile_adapted_industry_digest",
      "confidence_score": 0.79,
      "proto_events": [],
      "citations": []
    },
    {
      "digest_type": "6_month_industry",
      "time_window_days": 180,
      "channel": "industry",
      "content": "PROFILE LENS ... MEDIUM-TERM TRANSLATION ...",
      "generated_at": "2026-02-24T17:15:00Z",
      "model_name": "gpt-5",
      "model_version": "2026-02",
      "source_scope": "group=636;profile_adapted_industry_digest",
      "confidence_score": 0.75,
      "proto_events": [],
      "citations": []
    }
  ]
}
```

## Example SQL Transaction

```sql
BEGIN;

UPDATE agent_expertise_digest
SET is_current = false
WHERE agent_id = :agent_id
  AND digest_type = :digest_type
  AND is_current = true;

INSERT INTO agent_expertise_digest (
  agent_id,
  digest_type,
  time_window_days,
  channel,
  content,
  content_tokens_est,
  generated_at,
  model_name,
  model_version,
  source_scope,
  confidence_score,
  proto_events,
  citations,
  version,
  is_current
)
VALUES (
  :agent_id,
  :digest_type,
  :time_window_days,
  'industry',
  :content,
  :content_tokens_est,
  :generated_at,
  :model_name,
  :model_version,
  :source_scope,
  :confidence_score,
  :proto_events::jsonb,
  :citations::jsonb,
  :next_version,
  true
);

COMMIT;
```

## Validation Queries

Current coverage by digest type for research group `636`:

```sql
WITH g AS (
  SELECT agent_id
  FROM research_group_agents
  WHERE group_id = 636
)
SELECT digest_type, COUNT(*) AS current_rows
FROM agent_expertise_digest d
JOIN g ON g.agent_id = d.agent_id
WHERE d.is_current = true
GROUP BY digest_type
ORDER BY digest_type;
```

Agents missing either 30d or 180d current digest:

```sql
WITH g AS (
  SELECT agent_id
  FROM research_group_agents
  WHERE group_id = 636
),
cur AS (
  SELECT agent_id, digest_type
  FROM agent_expertise_digest
  WHERE is_current = true
)
SELECT g.agent_id
FROM g
LEFT JOIN cur c30 ON c30.agent_id = g.agent_id AND c30.digest_type = '1_month_industry'
LEFT JOIN cur c180 ON c180.agent_id = g.agent_id AND c180.digest_type = '6_month_industry'
WHERE c30.agent_id IS NULL OR c180.agent_id IS NULL
ORDER BY g.agent_id;
```

## Implementation Notes

- Keep content profile-adapted to persona and job function, not generic sector text.
- Populate `source_scope` for traceability and replay/debug.
- Prefer write-on-change to avoid unnecessary version churn.
- Never overwrite historical rows in place.
