Scrape a URL
Fetches and processes a URL, returning content in one or more formats wrapped in a ScrapeEnvelope. Simple requests use the HTTP fast-path (~500ms); complex requests (delay_ms > 0, cookies, custom headers, wait conditions) are routed to headless Chromium and return HTTP 202 (3-15s). When `webhook_url` is provided and the scrape is async, webhook deliveries include `X-RapidCrawl-Event: scrape.completed` or `X-RapidCrawl-Event: scrape.failed` so receivers can dispatch on the header without parsing the body shape.
Authorization
AuthorizationRequiredBearer <token>API key. Obtain from POST /v1/account/api-keys.
In: header
Request Body
application/jsonRequiredurlRequiredstring"uri"formatsarray<string>["rawHtml"]countrystring"US"Pattern: "^[A-Z]{2}$"cookiesarray<object>[]headersobject{}delay_msinteger0Minimum: 0Maximum: 10000timeout_msinteger30000Minimum: 1000Maximum: 60000asyncbooleanfalsewebhook_urlstringWebhook endpoint URL (HTTPS only; http:// rejected with HTTP 422)
"^https://"Format: "uri"eventsarray<string>["completed","failed"]markdownModestringMarkdown processing mode. article=article extraction (default), raw=minimal cleanup, llm=compact LLM-optimised output.
"article" | "raw" | "llm"markdownQuerystringBM25 query string for relevance-ranked content filtering. Omit or leave empty to disable.
200markdownLinksstringLink rendering style in the markdown output.
"inline" | "references" | "none"markdownCompactbooleanCollapse excessive whitespace for a more compact output.
markdownFilterImagesbooleanFilter low-signal images from the markdown output.
markdownIncludeMediabooleanWhen true, formats.links and formats.images return ScrapeScoredLink[] / ScrapeScoredImage[] (rich objects) instead of string[], and a top-level tables array is included. Only effective when markdown is in formats.
markdownIncludeWarningsbooleanWhen true, the response includes a top-level warnings array of ScrapeWarning objects. Only effective when markdown is in formats.
markdownIncludeStatsbooleanWhen true, the response includes a top-level stats object with ScrapeStats (chars, tokens, blocks). Only effective when markdown is in formats.
removeBase64ImagesbooleanWhen true (default), base64-encoded data: image sources are stripped from the HTML before markdown conversion and from the html format output. Set to false to preserve base64 images in the pipeline output. Does not affect rawHtml, which always returns the original HTML unchanged.
truecache_ttlstring | integerHow long a freshly fetched URL may be served from cache. 0 (string or integer) skips cache entirely, Nh/Nd set a TTL (capped at 168 h / 7 d). Honoured on the synchronous path only — the async branch accepts the field for forward compatibility but does not currently act on it.
"48h"customobjectUser-supplied JSON payload, echoed back on the success envelope (sync, async, webhook). Capped at 4096 UTF-8 bytes (Buffer.byteLength). Distinct from the system-owned metadata column. Does NOT affect cache-key inputs — two requests differing only in custom resolve to the same cache entry.
{}Scrape completed synchronously.