# Webhooks (Stream API)

The Stream API is built for large jobs — up to **100 000 values**. Instead of polling, Kwery
**pushes results to your endpoint** as they complete, in signed, batched deliveries.

## 1. Submit a stream job

`POST /stream` takes the same fields as `POST /job`, plus a `callback_url`:

```bash
curl -s https://api.kwery.co/stream \
  -H "Authorization: Bearer $KWERY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "idealo",
    "country": "de",
    "key": "gtin",
    "values": ["4006381333962", "4719512101148"],
    "callback_url": "https://your-app.example.com/kwery/webhook",
    "client_ref": "batch-2026-04-03"
  }'
```

Response:

```json
{
  "error": false,
  "job_id": "67e1234abc...",
  "webhook_secret": "a3f9...(64 hex chars)..."
}
```

> **Store `webhook_secret` immediately.** It is returned **once**, only at submission, and is
required to verify delivery signatures. It cannot be retrieved later.


### Optional fields

| Field | Default | Description |
|  --- | --- | --- |
| `client_ref` | — | Echoed back on every delivery. Use it to correlate deliveries to your own batch. |
| `delivery_batch_size` | 100 | How many results to group per delivery. |
| `include_meta` | `false` | When `true`, each result includes a `meta` provenance block (see below). |


## 2. Receive deliveries

Kwery sends `POST` requests to your `callback_url` as results accumulate, and once more at the
end (`is_final: true`). The payload embeds the results — you do **not** need to poll:

```json
{
  "job_id": "67e1234abc...",
  "source": "idealo",
  "country": "de",
  "batch_sequence": 3,
  "is_final": false,
  "values_total": 1000,
  "values_done": 75,
  "values_errors": 1,
  "client_ref": "batch-2026-04-03",
  "results": [
    {
      "key": "gtin:4006381333962",
      "index": 0,
      "completed_at": "2026-04-03T10:01:12.000Z",
      "result": {
        "query": { "source": "idealo", "country": "de", "key": "gtin", "value": "4006381333962" },
        "content": { "...": "parsed product + offers" },
        "success": true,
        "reason": null,
        "updated_at": "2026-04-03T10:01:12.000Z"
      }
    }
  ]
}
```

`batch_sequence` increases by one per delivery, so you can detect gaps and ordering. When
`is_final` is `true`, the job is complete.

### Provenance (`include_meta: true`)

Submit with `include_meta: true` to attach a `meta` block to each result, describing how the
data was gathered:

```json
"meta": {
  "crawls": [
    {
      "crawl_id": "9f1c...",
      "url": "https://www.idealo.de/preisvergleich/OffersOfProduct/...",
      "http_status": 200,
      "duration_ms": 1840,
      "observed_at": "2026-04-03T10:01:11.000Z",
      "geo": "de"
    }
  ],
  "partial": false
}
```

`partial` (Google only) indicates the marketplace reports more offers than were gathered in
this snapshot.

## 3. Verify the signature

Every delivery carries three headers:

| Header | Description |
|  --- | --- |
| `X-Kwery-Signature` | `v1=` + HMAC-SHA256 of `"{timestamp}.{raw_body}"`, keyed with your `webhook_secret`. |
| `X-Kwery-Timestamp` | Unix seconds when the delivery was signed. |
| `X-Kwery-Delivery-Id` | Unique ID for this delivery (useful for idempotency and support). |


To verify:

1. Read the **raw request body** before any JSON parsing.
2. Read `X-Kwery-Timestamp`; reject if it is more than **5 minutes** old (replay protection).
3. Recompute `v1=HMAC_SHA256(webhook_secret, "{timestamp}.{raw_body}")`.
4. Compare against `X-Kwery-Signature` using a constant-time comparison.
5. Reject with `401` on mismatch.


A complete, runnable receiver is in [Code samples](/guides/code-samples).

## 4. Acknowledge quickly

Return a `2xx` status as soon as you have received and verified the delivery, then process
asynchronously. Kwery treats the response status as follows:

| Your response | Kwery's action |
|  --- | --- |
| `2xx` | Marked delivered. |
| `4xx` (except `429`) | Permanent failure — **not retried**. |
| `3xx`, `5xx`, `429`, network error | Retried with backoff. |


### Retry schedule

Failed deliveries are retried up to **5 times** with this backoff (±20% jitter):

| Attempt | Delay |
|  --- | --- |
| 1 | immediate |
| 2 | +30 s |
| 3 | +2 min |
| 4 | +10 min |
| 5 | +30 min |


After the 5th attempt a delivery is marked **dead-lettered**.

## 5. Replay and polling fallback

If your endpoint was down, you can replay deliveries:

- `POST /stream/{id}/replay-all` — re-enqueue all dead-lettered deliveries for a job.
- `POST /stream/deliveries/{delivery_id}/replay` — replay a single delivery.


You can also poll instead of (or in addition to) webhooks:

- `GET /stream/{id}` — job status and counters.
- `GET /stream/{id}/results?offset=0&limit=100` — results sorted by `index` (max `limit` 1000).
- `GET /stream/{id}/deliveries` — delivery history and statuses.


## Next steps

- [Code samples](/guides/code-samples) — a full signature-verifying receiver in Node.js and Python.
- [Errors & limits](/guides/errors-and-limits) — limits, retention, and the response envelope.