# Filters Guide

Related guides:
[All Guides](./index.md) |
[Recruitment Filter Expansion](./recruitment_filter_expansion.md) |
[Quickstart](./quickstart.md) |
[MCP Adapter Quickstart](./mcp_quickstart.md)

Filters are used for recruitment and agent lookup endpoints, including:

- `POST /v1/research-groups/recruit`
- `POST /v1/research-groups/interview`
- `GET /v1/agents/search`
- `GET/POST /v1/agents/find`

`POST /v1/research-group-requests` is an exception: its natural-language parser can reject some keys (for example `city`) and may request clarifications.

`POST /v1/research-groups/recruit` is the explicit raw-filter / raw-DSL
escape hatch. New direct callers should send `X-Filter-Raw-DSL: true` to make
that intent explicit. During the migration window, callers that omit the
header still work as a grandfathered compatibility path, and the response
includes `raw_dsl_mode` so you can see which contract was used.

For a user-facing walkthrough of the new health-profile, occupation synonym,
industry, and MCP shortcut behavior, see
[`recruitment_filter_expansion.md`](./recruitment_filter_expansion.md).

## Discover Values Programmatically

Use `GET /v1/filters` before building filter payloads when you need current, country-scoped value strings.

`country` is required in v1 so returned values match a single country's taxonomy.
Use optional `fields` to focus discovery on specific value sets. It accepts a
comma-separated list or repeated query params, and aliases are accepted.

```bash
curl -X GET "$BASE_URL/filters?country=US&fields=health_drug_classes,health_conditions,bmi_class" \
  -H "Authorization: Bearer $API_KEY"
```

Response shape:

```json
{
  "country": "US",
  "recruitable_agent_count": 412,
  "filters": {
    "health_drug_classes": [{"value": "glp1_agonist", "count": 22}],
    "health_conditions": [{"value": "diabetes_t2", "count": 38}],
    "bmi_class": [{"value": "obese_2", "count": 17}]
  }
}
```

## Two Supported Input Formats

## 1) Nested DSL (recommended for complex targeting)

```json
{
  "match": "all",
  "filters": {
    "country": { "any": ["USA"] },
    "state": { "any": ["CA"] },
    "age": { "min": 25, "max": 45 },
    "income_annual_usd": { "min": 50000 },
    "health_conditions": { "any": ["hypertension"] }
  }
}
```

Use nested DSL when you need explicit per-field operators (`any`, `all`, `min`, `max`, `contains`).

## 2) Flat shorthand (convenient for simple calls)

```json
{
  "match": "all",
  "country": "USA",
  "state": "CA",
  "age_min": 25,
  "age_max": 45,
  "income_annual_usd_min": 50000,
  "bmi_class": "obese"
}
```

Use shorthand for quick one-off queries.

## Matching Rules (Important)

- Filter values are case-insensitive for list/string filters.
- Filters still require exact full-string values for categorical lists.
- No partial or fuzzy matching for categorical list values.
- Documented value aliases are expanded for `country`, GLP-1 drug-class inputs,
  common health-condition inputs, and broad BMI obesity inputs.

Example:

- `ethnicity=hispanic or latino` -> works
- `ethnicity=HISPANIC OR LATINO` -> works
- `ethnicity=hispanic` -> returns 0

## Canonical Value Sets

These are high-signal documented values from production testing.

## country

- `USA`
- `UK`
- `Germany`
- `Canada`

## state

- Use 2-letter US/Canada state/province codes (for example `CA`, `NY`, `ON`).
- Full names (for example `California`) are not supported for list matching.

## gender

- `male`
- `female`
- Canonical public filter for current gender identity.

## sex_at_birth

- `male`
- `female`

## labour_status

- `employed`
- `unemployed`
- `Not in labor force`

## occupation

- Exact live occupation labels are discoverable with
  `GET /v1/filters?country=USA&fields=occupation`.
- Normal `occupation` filters using shorthand or `any` mode also expand through
  the canonical job-title synonym map. For example, a user-facing title such as
  `portfolio analyst` can resolve to an available live label such as `Analyst`
  or `Equity Research Analyst` when the canonical title seed maps that synonym.
- `all` mode remains exact because it means one agent must match every supplied
  occupation value. Use shorthand or `{ "occupation": { "any": [...] } }` for
  synonym-backed matching.

## occupation_major_group

- Broad BLS-style occupational buckets are supported as exact list filters.
- Use exact strings such as:
  - `management`
  - `office_admin_support`
  - `transportation_material_moving`
  - `business_financial`
- Prefer this field when you want a broad occupational family rather than a
  single exact `occupation` label.

## religion

- `Catholic`
- `Protestant`
- `Jewish`
- `Muslim`
- `Buddhist`
- `Hindu`
- `Unaffiliated`

## ethnicity (documented set)

- `White`
- `Hispanic or Latino`
- `Black`
- `Asian`
- `Some other race`
- `American Indian or Alaska Native`
- `Native Hawaiian or Pacific Islander`
- `White (Non-Hispanic)`
- `Black (Non-Hispanic)`
- `Asian (Non-Hispanic)`
- `Hispanic (Any race)`
- `American Indian/Alaska Native (Non-Hispanic)`
- `Native Hawaiian/Other Pacific Islander (Non-Hispanic)`
- `Two or more races (Non-Hispanic)`
- `South Asian`
- `Chinese`
- `Filipino`
- `First Nations`
- `Arab`
- `Southeast Asian`
- `Latin American`
- `Metis` (accented variant may appear in source data)
- `West Asian`
- `Japanese`
- `Korean`
- `Multiple visible minorities`
- `Other visible minority (n.i.e.)`
- `Mixed`
- `Other`

## Compatibility Aliases

- `gender_identity` is accepted as a legacy alias for `gender`.
- `sex` is accepted as a compatibility alias for `gender`.
- `is_female` is deprecated but still accepted for backwards compatibility.
  `true` maps to `gender=female`; `false` maps to `gender=male`.

## Currently Unpopulated Filters

These filter keys are currently documented in internal/full specs but are not populated in production data:

- `political_party`

## Text and Numeric Filters

- Text:
  - Preferred structured form: `description.contains`
  - Structured AST also supports `description.any_of` and
    `description.all_of`
  - Legacy compatibility shorthand: top-level `description_contains`
    is transparently coerced to `description.contains`
  - Stage 2 deprecation: successful API responses that consume legacy
    `description_contains` may include a `compatibility_warnings` array so
    callers can migrate proactively
- Numeric ranges: `age`, `income_annual_usd`, `income_annual_local_currency`,
  `rx_medication_count`, `bmi`, `phq9_score`

## Health Filters

Health filters read from `agent_health_assignment`; agents without assigned
health data do not match health-specific filters.

- List fields: `health_conditions`, `health_drug_classes`, `smoking_status`,
  `alcohol_use`, `physical_activity_level`, `sleep_quality`, `diet_quality`,
  `bmi_class`, `phq9_severity`
- Range fields: `rx_medication_count`, `bmi`, `phq9_score`
- Compatibility aliases: `condition`, `conditions`, `health_condition`,
  `drug_class`, `drug_classes`, `medication_class`, `medication_classes`
- `health_drug_classes` is a structured medication-class filter, not a free-text
  medication-name search. Use `glp1_agonist` for GLP-1 users. Common GLP-1
  brands and ingredients such as `Ozempic`, `Wegovy`, `Mounjaro`, `Zepbound`,
  `semaglutide`, and `tirzepatide` normalize to `glp1_agonist`.
- Current drug-class slugs include `glp1_agonist`, `insulin`, `metformin`,
  `statin`, `antihypertensive`, `antidepressant_ssri`,
  `antidepressant_snri`, `antianxiety_benzo`, `blood_thinner`, `thyroid`,
  `birth_control`, `bronchodilator`, `inhaled_corticosteroid`,
  `proton_pump_inhibitor`, `sleep_aid`, and `opioid_pain`.
- Health profile payloads returned by agent search/find include provenance
  fields (`source`, `method_version`, `calibration_version`,
  `reference_version`) when present. Treat these fields as assignment
  provenance, not clinical verification.

Example GLP-1 search:

```json
{
  "match": "all",
  "filters": {
    "country": "USA",
    "drug_class": "Ozempic",
    "age": { "min": 35, "max": 65 }
  }
}
```

## Troubleshooting 0 Results

If a query returns `total_count = 0`:

1. Verify categorical value uses the exact full string (for example `Hispanic or Latino`, not `Hispanic`).
2. Verify state uses 2-letter codes.
3. Prefer `gender` over legacy aliases such as `gender_identity`, `sex`, or `is_female`.
4. Prefer the structured `description` object over legacy `description_contains`
   when authoring or editing reusable filter payloads.
5. Check if using an unpopulated filter field.
6. Relax one filter at a time to identify the restrictive field.

## Optional Strict Invalid-Value Mode (`422`)

When `API_V1_STRICT_INVALID_FILTERS=1` is enabled server-side, these endpoints can return
`422 Invalid filter value(s)` for unsupported categorical values:

- `POST /v1/research-groups/recruit`
- `POST /v1/research-groups/interview`
- `POST /v1/research-groups/{group_uuid}/append`
- `GET /v1/agents/search`
- `GET/POST /v1/agents/find`

The response includes field-level details and, when available, candidate suggestions.
If strict mode is disabled (default), unsupported values continue to behave as normal
zero-result queries.
