Guides

Scraping

How Bytekit's fast-path and slow-path scrape pipelines work, and when each is used.

POST /v1/scrape routes each request through one of two execution paths depending on the options you supply. Understanding which path a request takes helps you predict latency and tune your integration.

Fast path (~500ms)

Simple requests go through the fast path: Bytekit fetches the page directly over a proxy and returns the content, compressed. This is the default for any request that doesn't require a browser.

Fast-path responses include X-Scrape-* headers that describe how the request was handled.

A fast-path request looks like this:

curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Slow path (~3–15s)

Any of the following options forces the request onto the slow path, which renders the page in a full browser:

OptionTrigger condition
wait_for_selectorAny non-empty value
wait_until"networkidle"
cookiesNon-empty array
delay_msGreater than 0
headersNon-empty object
# This request uses the slow path (wait_for_selector is set)
curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "wait_for_selector": "#main-content"
  }'

country does not trigger the slow path

The country option routes the request through a geo-located proxy but does not force a browser render. It rides the fast path. Only when country is combined with one of the slow-path triggers listed above does the request render in a browser.

Automatic browser retry

When a direct fetch is blocked by bot protection — HTTP 403 or 429 where the site is actively refusing scrapers — Bytekit automatically retries the request in a full browser that can clear common challenges. This retry is transparent: the API response shape is identical, and you don't write any retry logic.

Other failure modes (network errors, timeouts, non-403/429 errors) return 502 internal_error directly without attempting the browser retry.

Response fields

FieldDescription
data.contentPage content in the requested format (markdown by default)
data.content_lengthCompressed wire bytes — used for bandwidth billing
data.metadata.statusCodeHTTP status code of the origin page
data.metadata.titlePage <title> element

The X-Scrape-* response headers describe how each request was handled (which path served it, cache status, timing, and the final URL) — useful for debugging.

Async scrapes

For long-running pages, pass "async": true to get a 202 Accepted response with a scrape id prefixed sc_. Poll the result with GET /v1/scrape/{'{id}'}.

# Start async scrape
curl -X POST https://api.bytekit.com/v1/scrape \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "async": true}'

# Poll for result
curl https://api.bytekit.com/v1/scrape/sc_... \
  -H "Authorization: Bearer sk_live_YOUR_KEY_HERE"

Webhook event header

Every outbound webhook POST includes an X-RapidCrawl-Event header with the event type:

Event typeTrigger
bulk.completedA bulk job finishes (all items processed)
scrape.completedAn async scrape succeeds
scrape.failedAn async scrape fails terminally
monitor.change_detectedA monitor fires and detects a visual change
monitor.capturedA monitor fires with no visual change (notify_on: every)
sitemap.completedA sitemap job finishes
sitemap.failedA sitemap job fails terminally

Use this header to dispatch events without inspecting the body shape:

from flask import Flask, request

app = Flask(__name__)

@app.route("/webhooks/bytekit", methods=["POST"])
def handle():
    event = request.headers.get("X-RapidCrawl-Event")
    if event == "bulk.completed":
        handle_bulk(request.json)
    elif event == "scrape.completed":
        handle_scrape(request.json)
    return "", 200

X-RapidCrawl-Event is a reserved header — you cannot supply it via webhook_headers in monitor or bulk configuration. Attempts to do so will be rejected with a 422 at request time.

Next steps