SDK ReferenceTypeScript SDK
Scrape
Fetch a URL as raw HTML, clean markdown, or structured content.
Scrape
Fetch a URL as raw HTML, clean markdown, or structured content. Accessed via client.scrape.
create
client.scrape.create(opts: ScrapeOpts): Promise<unknown>POST /v1/scrape — create a scrape job
get
client.scrape.get(id: ScrapeId): Promise<unknown>GET /v1/scrape/{id} — poll a scrape by ID
bulk
create
client.scrape.bulk.create(opts: Record<string, unknown>): Promise<unknown>POST /v1/scrape/bulk — create a bulk scrape job
get
client.scrape.bulk.get(id: BulkId): Promise<unknown>GET /v1/scrape/bulk/{id} — get a bulk scrape job
Options (ScrapeOpts)
url(required) —stringformats—Array<'rawHtml' | 'html' | 'markdown' | 'links' | 'images'>country—stringcookies—Array<Record<string, unknown>>headers—Record<string, string>delay_ms—numbertimeout_ms—numberasync—booleanwebhook_url—stringevents—Array<'queued' | 'completed' | 'failed'>markdownMode—MarkdownMode— Markdown processing mode. article=article extraction (default), raw=minimal cleanup, llm=compact LLM-optimised output.markdownQuery—string— BM25 query string for relevance-ranked content filtering. Omit or leave empty to disable.markdownLinks—MarkdownLinks— Link rendering style in the markdown output.markdownCompact—boolean— Collapse excessive whitespace for a more compact output.markdownFilterImages—boolean— Filter low-signal images from the markdown output.markdownIncludeMedia—boolean— When true,formats.linksandformats.imagesreturnScrapeScoredLink[]/ScrapeScoredImage[](rich objects) instead ofstring[], and a top-leveltablesarray is included. Only effective whenmarkdownis informats.markdownIncludeWarnings—boolean— When true, the response includes a top-levelwarningsarray ofScrapeWarningobjects. Only effective whenmarkdownis informats.markdownIncludeStats—boolean— When true, the response includes a top-levelstatsobject withScrapeStats(chars, tokens, blocks). Only effective whenmarkdownis informats.cache_ttl—string | 0— How long a freshly fetched URL may be served from cache.'0'/0disables cache,'Nh'/'Nd'set a TTL (capped at 168h / 7d). Default'48h'. Honoured on the synchronous path only — the async path accepts the value but does not currently act on it.custom—Record<string, unknown>— User-supplied JSON payload, echoed back on the success envelope so callers can correlate the response to caller-side state (job IDs, batch metadata). Capped at 4096 UTF-8 bytes after JSON serialization. Does NOT affect cache-key inputs — two requests differing only incustomshare the same cache slot.