Scraping

How Bytekit's fast-path and slow-path scrape pipelines work, and when each is used.

POST /v1/scrape routes each request through one of two execution paths depending on the options you supply. Understanding which path a request takes helps you predict latency and tune your integration.

Fast path (~500ms)

Simple requests go through the fast path: Bytekit fetches the page directly over a proxy and returns the content, compressed. This is the default for any request that doesn't require a browser.

Fast-path responses include X-Scrape-* headers that describe how the request was handled.

A fast-path request looks like this:

curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Slow path (~3–15s)

Any of the following options forces the request onto the slow path, which renders the page in a full browser:

Option	Trigger condition
`wait_for_selector`	Any non-empty value
`wait_until`	`"networkidle"`
`cookies`	Non-empty array
`delay_ms`	Greater than `0`
`headers`	Non-empty object

# This request uses the slow path (wait_for_selector is set)
curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "wait_for_selector": "#main-content"
  }'

`country` does not trigger the slow path

The country option routes the request through a geo-located proxy but does not force a browser render. It rides the fast path. Only when country is combined with one of the slow-path triggers listed above does the request render in a browser.

Automatic browser retry

When a direct fetch is blocked by bot protection — HTTP 403 or 429 where the site is actively refusing scrapers — Bytekit automatically retries the request in a full browser that can clear common challenges. This retry is transparent: the API response shape is identical, and you don't write any retry logic.

Other failure modes (network errors, timeouts, non-403/429 errors) return 502 internal_error directly without attempting the browser retry.

Response fields

Field	Description
`data.content`	Page content in the requested format (markdown by default)
`data.content_length`	Compressed wire bytes — used for bandwidth billing
`data.metadata.statusCode`	HTTP status code of the origin page
`data.metadata.title`	Page `<title>` element

The X-Scrape-* response headers describe how each request was handled (which path served it, cache status, timing, and the final URL) — useful for debugging.

Async scrapes

For long-running pages, pass "async": true to get a 202 Accepted response with a scrape id prefixed sc_. Poll the result with GET /v1/scrape/{'{id}'}.

# Start async scrape
curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "async": true}'

# Poll for result
curl https://api.bytekit.com/v1/scrape/sc_... \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE"

Webhook event header

Every outbound webhook POST includes an X-RapidCrawl-Event header with the event type:

Event type	Trigger
`bulk.completed`	A bulk job finishes (all items processed)
`scrape.completed`	An async scrape succeeds
`scrape.failed`	An async scrape fails terminally
`monitor.change_detected`	A monitor fires and detects a visual change
`monitor.captured`	A monitor fires with no visual change (notify_on: every)
`sitemap.completed`	A sitemap job finishes
`sitemap.failed`	A sitemap job fails terminally

Use this header to dispatch events without inspecting the body shape:

from flask import Flask, request

app = Flask(__name__)

@app.route("/webhooks/bytekit", methods=["POST"])
def handle():
    event = request.headers.get("X-RapidCrawl-Event")
    if event == "bulk.completed":
        handle_bulk(request.json)
    elif event == "scrape.completed":
        handle_scrape(request.json)
    return "", 200

X-RapidCrawl-Event is a reserved header — you cannot supply it via webhook_headers in monitor or bulk configuration. Attempts to do so will be rejected with a 422 at request time.

Next steps

Rate Limits — quota model and concurrency slots
Errors — how to interpret and retry error responses
API Reference: Scrape — full request/response schema

Scraping

On this page