# Stripe Billing Operations Guide

This guide covers runtime operations for the Stripe post-event billing pipeline.

## Prerequisites

Set these environment variables in the runtime where billing tasks run:

- `STRIPE_BILLING_ENABLED=1`
- `STRIPE_SECRET_KEY=...`
- `STRIPE_WEBHOOK_SECRET=...`
- `STRIPE_BILLING_QUEUE` (optional, default `default`)
- `STRIPE_USAGE_FLUSH_LOCK_KEY` (optional)
- `STRIPE_USAGE_FLUSH_LOCK_TTL_SECONDS` (optional, default `3300`)
- `STRIPE_BILLING_VALIDATE_CONTRACT_BEFORE_FLUSH` (optional, default `1`)
- `STRIPE_BILLING_CONTRACT_LOOKBACK_DAYS` (optional, default `30`)
- `STRIPE_BILLING_CONTRACT_REQUIRE_ZEITGEIST` (optional, default `0`)
- `STRIPE_BILLING_CONTRACT_FAIL_CLOSED` (optional, default `0`)
- `SAAS_WORKSPACE_BOOKING_URL` (optional, default `https://cal.com/andreas-duess-fd/fishdog-demo`)

## SaaS Gate Stopgap

The `saas` entitlement is the workspace-access gate used by the web app. During
Phase 0 rollout, orgs without `saas` are routed to an interim "book time with
Andreas" page. This is not the final Stripe upgrade experience.

Future Stripe work should replace the booking reroute with self-service upgrade
from the gated state. That flow should read the same `saas` entitlement after
checkout completes so the workspace unlock path stays entitlement-driven.

## Billing Task Commands

Run immediately (single pass):

```bash
flask billing flush_now
```

Run with in-process retry/backoff (3 attempts):

```bash
flask billing flush_hourly
```

Enqueue async RQ job:

```bash
flask billing enqueue_hourly
```

Validate upstream `billable_events` contract:

```bash
flask billing validate_contract --lookback-days 30 --require-zeitgeist --fail-closed
```

Use this command in preflight/staging checks before enabling production charging.

Run runtime/config preflight checks:

```bash
flask billing preflight --require-enabled --fail-closed
```

This validates billing-critical config and exits non-zero when checks fail.

Run a single-organization smoke check (WS10) in test mode:

```bash
flask billing smoke_org --org-id 123 --require-enabled --validate-contract --fail-closed
```

Optional flags:

- `--skip-contract` to bypass contract checks during diagnosis
- `--no-fail-closed` to collect a report without exiting non-zero
- `--refresh-period` to force Stripe period sync before flush

Script wrapper for repeatable ops usage:

- `scripts/run_billing_smoke_org.sh`

Example:

```bash
scripts/run_billing_smoke_org.sh 123
```

## Hourly Scheduler (Minute 13)

Recommended cron entry:

```cron
13 * * * * /path/to/repo/scripts/run_billing_hourly.sh >> /var/log/billing_hourly.log 2>&1
```

Included helper script:

- `scripts/run_billing_hourly.sh`

## Overlap Guard

Hourly flush jobs acquire a Redis lock before processing. If another flush run already holds the lock, the new run exits safely with a `skipped=true` result and no side effects.

Before each hourly flush, the system can also run billable-event contract validation. When `STRIPE_BILLING_CONTRACT_FAIL_CLOSED=1`, failed validation blocks the flush.

## Retries

- `flask billing flush_hourly` uses in-process backoff retries.
- Webhook processing and async billing jobs also use queued retry policies.

## Structured Logs

The billing pipeline emits structured logs for key lifecycle points:

- usage flush:
  - `Stripe usage flush started`
  - `Stripe usage flush found no pending events`
  - `Stripe usage flush completed`
  - `Stripe usage flush batch started`
  - `Stripe usage flush batch completed`
- meter submission:
  - `Stripe meter event submit attempt`
  - `Stripe meter event submit failed`
  - `Stripe meter event submit succeeded`
- webhook ingest and processing:
  - `Stripe webhook accepted and queued`
  - `Stripe webhook duplicate ignored`
  - `Stripe webhook processing started`
  - `Stripe webhook event ignored`
  - `Stripe webhook processing completed`
  - `Stripe webhook processing skipped (already processed)`
- invoice.created adjustments:
  - `Stripe invoice.created billing adjustments applied`

Recommended filters for dashboards/alerts:

- repeated `Stripe meter event submit failed` for the same `organization_id` and `idempotency_key`
- elevated `Stripe webhook duplicate ignored` rates (possible replay storms)
- `Stripe webhook processing started` without matching `completed` within expected worker SLA
