# Recruitment Filter Expansion Guide

This guide explains the expanded recruitment and agent-search filter surface for
the FishDog API and FishDog MCP server.

The short version: users can now target agents by richer professional metadata
and structured health attributes, discover the currently available values before
they recruit, and use natural occupation terms without knowing the exact
occupation labels stored in the system.

## What Changed

The recruitment/search surface now supports three major improvements.

1. Professional targeting is broader and easier to use.
   - `occupation`
   - `occupation_major_group`
   - `industry`
   - `labour_status`
   - `title_variant`

2. Structured health-profile targeting is available as first-class filters.
   - `health_conditions`
   - `health_drug_classes`
   - `rx_medication_count`
   - `smoking_status`
   - `alcohol_use`
   - `physical_activity_level`
   - `sleep_quality`
   - `diet_quality`
   - `bmi`
   - `bmi_class`
   - `phq9_score`
   - `phq9_severity`

3. Occupation search is now synonym-backed.
   - Callers can pass a natural job title such as `stock analyst`,
     `portfolio analyst`, `CPA`, or another canonical-title synonym.
   - The API expands that input through the shared canonical occupation
     synonym map before matching live occupation labels.
   - Users no longer need to know the exact stored occupation label in advance.

## Supported Surfaces

These filters are supported by the explicit recruitment/search surfaces:

| Surface | Purpose | Notes |
|---|---|---|
| `GET /v1/filters` | Discover valid values and counts | Use this before strict raw filtering. Requires `country`. |
| `GET /v1/agents/search` | Browse matching agents | Query-param shorthand or JSON `filters` query are supported. |
| `GET /v1/agents/find` | Find one matching agent | Same filter behavior as search; returns one agent or `404`. |
| `POST /v1/agents/find` | Find one matching agent with JSON body | Useful for nested DSL payloads. |
| `POST /v1/research-groups/recruit` | Recruit a new group from explicit filters | Send `X-Filter-Raw-DSL: true` for new direct raw-filter callers. |
| `POST /v1/research-groups/interview` | Recruit/interview using filters and objective | Same parser/filter support. |
| `POST /v1/research-groups/{group_uuid}/append` | Add more agents to an existing group | Same parser/filter support via `filters`. |
| `v1.filters.get` MCP tool | Discover filter values | MCP wrapper for `GET /v1/filters`. |
| `v1.agents.search` / `v1.agents.find` MCP tools | Search/find agents | Includes typed shortcut arguments for profession and health fields. |
| `v1.research-groups.recruit` MCP tool | Recruit groups | Includes typed shortcut arguments for profession and health fields. |

## Value Discovery

Use `GET /v1/filters` to discover country-scoped values and counts. This is the
most reliable way to build UI controls, preflight a cohort, or understand why a
filter is too narrow.

```bash
curl -X GET "$BASE_URL/v1/filters?country=USA&fields=occupation,industry,health_drug_classes,bmi_class" \
  -H "Authorization: Bearer $API_KEY"
```

Example response shape:

```json
{
  "country": "USA",
  "recruitable_agent_count": 412,
  "filters": {
    "occupation": [
      { "value": "Equity Research Analyst", "count": 7 }
    ],
    "industry": [
      { "value": "Financial Services", "count": 22 }
    ],
    "health_drug_classes": [
      { "value": "glp1_agonist", "count": 12 }
    ],
    "bmi_class": [
      { "value": "obese_2", "count": 6 }
    ]
  }
}
```

`fields` can be a comma-separated string or repeated query params. Aliases are
accepted, so `drug_class` resolves to `health_drug_classes`.

## Professional Filters

### `occupation`

Use this when you want a specific role or job title.

`occupation` now resolves values in this order:

1. Exact live occupation label match.
2. Canonical job-title synonym match.
3. Canonical title corpus occupation keys.
4. Canonical title and synonym values if they exist in live occupation labels.
5. Close-match fallback against live occupation labels.

This means a user can ask for `stock analyst` without knowing whether the live
label is `Equity Research Analyst`, `Financial Analyst`, or another mapped
occupation.

Recommended forms:

```json
{
  "filters": {
    "country": "USA",
    "occupation": "stock analyst"
  }
}
```

```json
{
  "filters": {
    "country": { "any": ["USA"] },
    "occupation": { "any": ["portfolio analyst", "CPA"] }
  }
}
```

Important behavior:

- Synonym expansion applies to shorthand `occupation` and
  `occupation.any` filters.
- `occupation.all` remains exact. It means one agent must match every supplied
  occupation value, so synonym expansion is intentionally not applied there.
- For broad audiences, prefer `occupation_major_group` instead of a long list of
  exact occupations.

### `occupation_major_group`

Use this for broad BLS-style occupational families.

Examples:

```json
{
  "filters": {
    "country": "USA",
    "occupation_major_group": "business_financial"
  }
}
```

```json
{
  "filters": {
    "country": "USA",
    "occupation_major_group": { "any": ["management", "business_financial"] }
  }
}
```

Aliases:

- `occupation_major_groups`
- `major_group`

### `industry`

Use this for the agent's industry sector. Industry labels can be discovered with
`GET /v1/filters?country=USA&fields=industry`.

```json
{
  "filters": {
    "country": "USA",
    "industry": { "any": ["Financial Services", "Healthcare"] }
  }
}
```

For URL query shorthand, repeated params are safest when passing multiple
industry values:

```bash
curl -X GET "$BASE_URL/v1/agents/search?country=USA&industry=Financial%20Services&industry=Healthcare" \
  -H "Authorization: Bearer $API_KEY"
```

### `labour_status`

Use this for employment status.

Aliases:

- `employment`
- `employment_status`
- `labor_status`

Example:

```json
{
  "filters": {
    "country": "USA",
    "labour_status": "employed"
  }
}
```

### `title_variant`

Use this when you need a more specific normalized title variant available in the
occupation assignment layer. Discover values with:

```bash
curl -X GET "$BASE_URL/v1/filters?country=USA&fields=title_variant" \
  -H "Authorization: Bearer $API_KEY"
```

Alias:

- `title_variants`

## Health Filters

Health filters are structured filters backed by `agent_health_assignment`.
Agents without assigned health data do not match health-specific filters.

Health values are cohort attributes with provenance. They are not clinical
verification and should not be described as confirmed medical records.

### Health List Filters

These use exact structured values:

| Filter | What It Targets | Example |
|---|---|---|
| `health_conditions` | Structured condition slugs | `diabetes_t2`, `hypertension` |
| `health_drug_classes` | Structured medication-class slugs | `glp1_agonist`, `statin` |
| `smoking_status` | Smoking cohort | `former`, `current`, `never` |
| `alcohol_use` | Alcohol-use cohort | `moderate` |
| `physical_activity_level` | Activity cohort | `low`, `moderate`, `high` |
| `sleep_quality` | Sleep-quality cohort | `poor`, `fair`, `good` |
| `diet_quality` | Diet-quality cohort | `mixed`, `healthy` |
| `bmi_class` | BMI category | `normal`, `overweight`, `obese_2` |
| `phq9_severity` | PHQ-9 severity band | `minimal`, `mild`, `moderate` |

Aliases:

- `condition`, `conditions`, `health_condition` -> `health_conditions`
- `drug_class`, `drug_classes`, `medication_class`,
  `medication_classes` -> `health_drug_classes`
- `health_bmi_class` -> `bmi_class`

### Health Range Filters

These accept `min`, `max`, or both:

| Filter | What It Targets |
|---|---|
| `rx_medication_count` | Number of prescription medication classes assigned |
| `bmi` | Numeric BMI |
| `phq9_score` | Numeric PHQ-9 score |

Example:

```json
{
  "filters": {
    "country": "USA",
    "rx_medication_count": { "min": 2 },
    "bmi": { "min": 30 },
    "phq9_score": { "max": 10 }
  }
}
```

### GLP-1 Search

For GLP-1 users, use `health_drug_classes` or its `drug_class` alias. Do not
search profile text for medication names.

Canonical value:

- `glp1_agonist`

Accepted GLP-1 aliases include:

- `GLP-1`
- `GLP1`
- `Ozempic`
- `Wegovy`
- `Mounjaro`
- `Zepbound`
- `semaglutide`
- `tirzepatide`
- `Rybelsus`
- `Saxenda`
- `Victoza`

Example agent search:

```bash
curl -X GET "$BASE_URL/v1/agents/search?country=USA&drug_class=Ozempic&age_min=35&age_max=65" \
  -H "Authorization: Bearer $API_KEY"
```

Equivalent nested filter:

```json
{
  "match": "all",
  "filters": {
    "country": "USA",
    "drug_class": "Ozempic",
    "age": { "min": 35, "max": 65 }
  }
}
```

The parser normalizes `Ozempic` to `glp1_agonist` before searching.

## Response Fields Unlocked

Agent search, find, recruitment, append, and group response payloads can now
include richer agent records.

Professional fields:

```json
{
  "occupation": "Equity Research Analyst",
  "occupation_major_group": "business_financial",
  "industry": "Financial Services",
  "labour_status": "employed",
  "title_variant": "portfolio analyst"
}
```

Health profile fields:

```json
{
  "health_profile": {
    "conditions": ["diabetes_t2"],
    "drug_classes": ["glp1_agonist"],
    "rx_medication_count": 2,
    "smoking_status": "never",
    "alcohol_use": "moderate",
    "physical_activity_level": "low",
    "sleep_quality": "fair",
    "diet_quality": "mixed",
    "bmi": 31,
    "bmi_class": "obese_1",
    "phq9_score": 6,
    "phq9_severity": "mild",
    "source": "synthetic_assignment",
    "method_version": "health_v1",
    "calibration_version": "calibration_2026_05",
    "reference_version": "nhanes_reference_2026_05"
  }
}
```

The provenance fields explain how the assignment was produced. They should be
treated as assignment metadata, not as user-facing medical evidence.

## Matching Rules

Top-level `match` controls how different fields combine:

- `match: "all"` means every supplied field must match.
- `match: "any"` means at least one supplied field must match.
- `and` is accepted as an alias for `all`.
- `or` is accepted as an alias for `any`.

Within list fields:

- `{ "any": [...] }` means any value in that field can match.
- `{ "all": [...] }` means one agent must match all values in that field.
- `or` is accepted as an alias for field-level `any`.
- `and` is accepted as an alias for field-level `all`.

Categorical matching is case-insensitive but still value-based. For fields
other than `occupation` synonym expansion and documented aliases, do not rely on
partial text matching.

## Effective Usage Pattern

Use this workflow when building a precise recruitment cohort.

1. Discover values.

```bash
curl -X GET "$BASE_URL/v1/filters?country=USA&fields=occupation,industry,health_drug_classes,health_conditions" \
  -H "Authorization: Bearer $API_KEY"
```

2. Probe the audience with `GET /v1/agents/search`.

```bash
curl -X GET "$BASE_URL/v1/agents/search?country=USA&occupation=stock%20analyst&drug_class=GLP-1" \
  -H "Authorization: Bearer $API_KEY"
```

3. Recruit only after the search count looks healthy.

```bash
curl -X POST "$BASE_URL/v1/research-groups/recruit" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Filter-Raw-DSL: true" \
  -d '{
    "name": "US financial analysts using GLP-1s",
    "group_size": 8,
    "filters": {
      "match": "all",
      "country": "USA",
      "occupation": "stock analyst",
      "drug_class": "Ozempic"
    }
  }'
```

4. Inspect returned agents.

Use `occupation`, `industry`, and `health_profile` in the response to confirm
the cohort is coherent before starting a study or direct-question workflow.

## Example: Healthcare Consumer Cohort

Target adults in the US with type 2 diabetes, obesity, and GLP-1 exposure:

```json
{
  "match": "all",
  "filters": {
    "country": "USA",
    "health_conditions": { "any": ["diabetes"] },
    "drug_class": { "any": ["Mounjaro", "Wegovy"] },
    "bmi_class": "obese",
    "age": { "min": 30, "max": 70 }
  }
}
```

What the API normalizes:

- `diabetes` -> `diabetes_t2`
- `Mounjaro` / `Wegovy` -> `glp1_agonist`
- `obese` -> `obese`, `obese_1`, `obese_2`, `obese_3`

## Example: Occupation Without Exact Label Knowledge

Target finance personas even if the caller does not know the exact stored
occupation:

```json
{
  "match": "all",
  "filters": {
    "country": "USA",
    "occupation": { "any": ["portfolio analyst", "stock analyst"] },
    "industry": { "any": ["Financial Services"] }
  }
}
```

The occupation filter can resolve through canonical-title synonyms and corpus
occupation keys before matching the live occupation labels.

## Example: Broad Role Family Plus Industry

Target supply-chain and logistics roles without overfitting to exact titles:

```json
{
  "match": "all",
  "filters": {
    "country": "USA",
    "occupation_major_group": {
      "any": ["transportation_and_material_moving", "management"]
    },
    "industry": {
      "any": ["Transportation", "Logistics", "Retail"]
    }
  }
}
```

Use `occupation_major_group` when exact job titles are too narrow and
`occupation` when the study needs a specific profession.

## MCP Usage

The FishDog MCP server exposes the same functionality with typed shortcut
arguments.

Value discovery:

```json
{
  "tool": "v1.filters.get",
  "arguments": {
    "country": "USA",
    "fields": ["health_drug_classes", "health_conditions", "occupation"]
  }
}
```

Search:

```json
{
  "tool": "v1.agents.search",
  "arguments": {
    "country": "USA",
    "occupation": "stock analyst",
    "drug_class": "Ozempic",
    "age_min": 35,
    "age_max": 65
  }
}
```

Recruit:

```json
{
  "tool": "v1.research-groups.recruit",
  "arguments": {
    "name": "US GLP-1 finance analysts",
    "group_size": 8,
    "country": "USA",
    "occupation": "stock analyst",
    "drug_class": "GLP-1"
  }
}
```

MCP guardrails now explicitly tell agents to use structured health filters
instead of description text, and to pass natural job titles through
`occupation` so the API can apply synonym expansion.

## What This Unlocks

For end users:

- They can ask for occupations in their own language without learning the
  internal taxonomy first.
- They can recruit specific health cohorts such as GLP-1 users, people with
  type 2 diabetes, or agents with a BMI/PHQ-9 range.
- They can preview available values and counts before spending effort on a
  recruitment run.
- They get richer recruited-agent payloads that make cohort validation easier.

For AI agents and MCP workflows:

- Tool callers can use typed shortcut arguments rather than writing raw SQL-like
  filter logic.
- The MCP server can discover values, search, find, and recruit using one
  consistent filter vocabulary.
- Natural-language recruitment plans can be translated into executable filters
  with fewer brittle exact-label assumptions.

For API clients:

- Existing filter payloads continue to work.
- New fields are additive.
- Strict invalid-value mode can surface `422` errors with field-level details
  when enabled server-side.
- `GET /v1/filters` provides a stable discovery mechanism for UI dropdowns,
  client-side validation, and preflight checks.

## Common Mistakes

- Do not search `description` for GLP-1 medication names. Use
  `drug_class` / `health_drug_classes`.
- Do not assume every agent has health data. Health-specific filters only match
  agents with assigned health profiles.
- Do not use `occupation.all` for synonym-backed occupation matching. Use
  shorthand `occupation` or `occupation.any`.
- Do not hardcode occupation or industry values in clients when
  `GET /v1/filters` can discover current values.
- Do not present `health_profile` as verified medical records. Present it as
  structured cohort assignment metadata.

