API Reference
Scrape
POST /api/v1/scrape — single-page capture
Single-page capture with automatic engine selection. Converts any web page into clean, LLM-ready Markdown.
Endpoint
POST /api/v1/scrapeParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | — | Required. URL to scrape |
formats | string[] | ["markdown"] | Output formats: "markdown", "html", "rawHtml", "links", "images", "screenshot" |
engine | string | "auto" | "auto", "http", or "browser" |
timeout | integer | 30000 | Timeout in milliseconds |
headers | object | {} | Custom HTTP headers |
onlyMainContent | boolean | true | Extract only main content (strips nav, footer, etc.) |
waitFor | integer | 0 | Wait time before scraping (ms, browser only) |
waitForSelector | string | — | CSS selector to wait for (browser only) |
removeBase64Images | boolean | true | Strip base64-encoded images from output |
skipTlsVerification | boolean | false | Skip TLS certificate verification |
screenshot | boolean | false | Capture page screenshot (browser only) |
screenshotFormat | string | "png" | "png" or "jpeg" |
actions | array | — | Browser actions to perform before scraping |
includeTags | string[] | — | CSS selectors to include |
excludeTags | string[] | — | CSS selectors to exclude |
Examples
cURL
curl -X POST http://localhost:8080/api/v1/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown", "html"]
}'Python
import requests
response = requests.post("http://localhost:8080/api/v1/scrape", json={
"url": "https://example.com",
"formats": ["markdown", "html"]
})
data = response.json()
print(data["data"]["markdown"])JavaScript
const response = await fetch("http://localhost:8080/api/v1/scrape", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
url: "https://example.com",
formats: ["markdown", "html"],
}),
});
const data = await response.json();
console.log(data.data.markdown);Response
Metadata is always included in the response regardless of the formats parameter. Other fields (markdown, html, rawHtml, links, images, screenshot) are only present when requested via formats.
{
"success": true,
"data": {
"title": "Example Domain",
"description": "",
"url": "https://example.com",
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"html": "<html>...</html>",
"metadata": {
"title": "Example Domain",
"description": "",
"language": "en",
"keywords": null,
"robots": null,
"ogTitle": null,
"ogDescription": null,
"ogUrl": null,
"ogImage": null,
"url": "https://example.com",
"sourceUrl": "https://example.com",
"statusCode": 200,
"contentType": "text/html",
"canonicalUrl": null,
"wordCount": 58,
"readingTime": 1,
"excerpt": null,
"detectedFrameworks": null,
"detectionReason": null,
"contentScriptRatio": null
}
}
}Fields like links, images, rawHtml, and screenshot appear when their corresponding format is requested. The title, description, and url fields are duplicated at the top level of the document for convenience.
Browser Actions
When using engine: "browser", you can perform actions before scraping:
curl -X POST http://localhost:8080/api/v1/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/login",
"engine": "browser",
"actions": [
{"type": "waitForSelector", "selector": "#username"},
{"type": "type", "selector": "#username", "text": "user@example.com"},
{"type": "type", "selector": "#password", "text": "password"},
{"type": "click", "selector": "#submit"},
{"type": "wait", "milliseconds": 2000}
]
}'Supported action types: click, type, scroll, wait, waitForSelector. All action type names use camelCase.