Scraping
How Bytekit's fast-path and slow-path scrape pipelines work, and when each is used.
POST /v1/scrape routes each request through one of two execution paths depending on the
options you supply. Understanding which path a request takes helps you predict latency and
tune your integration.
Fast path (~500ms)
Simple requests go through the fast path: Bytekit fetches the page directly over a proxy and returns the content, compressed. This is the default for any request that doesn't require a browser.
Fast-path responses include X-Scrape-* headers that describe how the request was handled.
A fast-path request looks like this:
curl -X POST https://api.bytekit.com/v1/scrape \
-H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'Slow path (~3–15s)
Any of the following options forces the request onto the slow path, which renders the page in a full browser:
| Option | Trigger condition |
|---|---|
wait_for_selector | Any non-empty value |
wait_until | "networkidle" |
cookies | Non-empty array |
delay_ms | Greater than 0 |
headers | Non-empty object |
# This request uses the slow path (wait_for_selector is set)
curl -X POST https://api.bytekit.com/v1/scrape \
-H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"wait_for_selector": "#main-content"
}'country does not trigger the slow path
The country option routes the request through a geo-located proxy but does not force a
browser render. It rides the fast path. Only when country is combined with one of the
slow-path triggers listed above does the request render in a browser.
Automatic browser retry
When a direct fetch is blocked by bot protection — HTTP 403 or 429 where the site is actively refusing scrapers — Bytekit automatically retries the request in a full browser that can clear common challenges. This retry is transparent: the API response shape is identical, and you don't write any retry logic.
Other failure modes (network errors, timeouts, non-403/429 errors) return 502 internal_error
directly without attempting the browser retry.
Response fields
| Field | Description |
|---|---|
data.content | Page content in the requested format (markdown by default) |
data.content_length | Compressed wire bytes — used for bandwidth billing |
data.metadata.statusCode | HTTP status code of the origin page |
data.metadata.title | Page <title> element |
The X-Scrape-* response headers describe how each request was handled (which path served it,
cache status, timing, and the final URL) — useful for debugging.
Async scrapes
For long-running pages, pass "async": true to get a 202 Accepted response with a scrape
id prefixed sc_. Poll the result with GET /v1/scrape/{'{id}'}.
# Start async scrape
curl -X POST https://api.bytekit.com/v1/scrape \
-H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "async": true}'
# Poll for result
curl https://api.bytekit.com/v1/scrape/sc_... \
-H "Authorization: Bearer sk_live_YOUR_KEY_HERE"Webhook event header
Every outbound webhook POST includes an X-RapidCrawl-Event header with the event type:
| Event type | Trigger |
|---|---|
bulk.completed | A bulk job finishes (all items processed) |
scrape.completed | An async scrape succeeds |
scrape.failed | An async scrape fails terminally |
monitor.change_detected | A monitor fires and detects a visual change |
monitor.captured | A monitor fires with no visual change (notify_on: every) |
sitemap.completed | A sitemap job finishes |
sitemap.failed | A sitemap job fails terminally |
Use this header to dispatch events without inspecting the body shape:
from flask import Flask, request
app = Flask(__name__)
@app.route("/webhooks/bytekit", methods=["POST"])
def handle():
event = request.headers.get("X-RapidCrawl-Event")
if event == "bulk.completed":
handle_bulk(request.json)
elif event == "scrape.completed":
handle_scrape(request.json)
return "", 200X-RapidCrawl-Event is a reserved header — you cannot supply it via webhook_headers in
monitor or bulk configuration. Attempts to do so will be rejected with a 422 at request
time.
Next steps
- Rate Limits — quota model and concurrency slots
- Errors — how to interpret and retry error responses
- API Reference: Scrape — full request/response schema