Scrape

Single-page capture with automatic engine selection. Converts any web page into clean, LLM-ready Markdown.

Endpoint

POST /api/v1/scrape

Parameters

Parameter	Type	Default	Description
`url`	string	—	Required. URL to scrape
`formats`	string[]	`["markdown"]`	Output formats: `"markdown"`, `"html"`, `"rawHtml"`, `"links"`, `"images"`, `"screenshot"`
`engine`	string	`"auto"`	`"auto"`, `"http"`, or `"browser"`
`timeout`	integer	`30000`	Timeout in milliseconds
`headers`	object	`{}`	Custom HTTP headers
`onlyMainContent`	boolean	`true`	Extract only main content (strips nav, footer, etc.)
`waitFor`	integer	`0`	Wait time before scraping (ms, browser only)
`waitForSelector`	string	—	CSS selector to wait for (browser only)
`removeBase64Images`	boolean	`true`	Strip base64-encoded images from output
`skipTlsVerification`	boolean	`false`	Skip TLS certificate verification
`screenshot`	boolean	`false`	Capture page screenshot (browser only)
`screenshotFormat`	string	`"png"`	`"png"` or `"jpeg"`
`actions`	array	—	Browser actions to perform before scraping
`includeTags`	string[]	—	CSS selectors to include
`excludeTags`	string[]	—	CSS selectors to exclude

Examples

cURL

curl -X POST http://localhost:8080/api/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "html"]
  }'

Python

import requests

response = requests.post("http://localhost:8080/api/v1/scrape", json={
    "url": "https://example.com",
    "formats": ["markdown", "html"]
})

data = response.json()
print(data["data"]["markdown"])

JavaScript

const response = await fetch("http://localhost:8080/api/v1/scrape", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    url: "https://example.com",
    formats: ["markdown", "html"],
  }),
});

const data = await response.json();
console.log(data.data.markdown);

Metadata is always included in the response regardless of the formats parameter. Other fields (markdown, html, rawHtml, links, images, screenshot) are only present when requested via formats.

{
  "success": true,
  "data": {
    "title": "Example Domain",
    "description": "",
    "url": "https://example.com",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "html": "<html>...</html>",
    "metadata": {
      "title": "Example Domain",
      "description": "",
      "language": "en",
      "keywords": null,
      "robots": null,
      "ogTitle": null,
      "ogDescription": null,
      "ogUrl": null,
      "ogImage": null,
      "url": "https://example.com",
      "sourceUrl": "https://example.com",
      "statusCode": 200,
      "contentType": "text/html",
      "canonicalUrl": null,
      "wordCount": 58,
      "readingTime": 1,
      "excerpt": null,
      "detectedFrameworks": null,
      "detectionReason": null,
      "contentScriptRatio": null
    }
  }
}

Fields like links, images, rawHtml, and screenshot appear when their corresponding format is requested. The title, description, and url fields are duplicated at the top level of the document for convenience.

Browser Actions

When using engine: "browser", you can perform actions before scraping:

curl -X POST http://localhost:8080/api/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/login",
    "engine": "browser",
    "actions": [
      {"type": "waitForSelector", "selector": "#username"},
      {"type": "type", "selector": "#username", "text": "user@example.com"},
      {"type": "type", "selector": "#password", "text": "password"},
      {"type": "click", "selector": "#submit"},
      {"type": "wait", "milliseconds": 2000}
    ]
  }'

Supported action types: click, type, scroll, wait, waitForSelector. All action type names use camelCase.