Essence
API Reference

Extract

POST /api/v1/extract — structured data extraction

Extract structured data from web pages using CSS selectors, LLM-based extraction, or both. Returns JSON objects conforming to your schema.

Endpoint

POST /api/v1/extract

Parameters

ParameterTypeDefaultDescription
urlsstring[]Required. URLs to extract from (1-10)
schemaobjectJSON Schema defining the desired output structure
promptstringNatural language instruction for LLM extraction
selectorsobjectCSS selector mappings: {"fieldName": "css.selector"}
modestring"auto""auto", "css", or "llm"
enginestring"auto""auto", "http", or "browser"
timeoutinteger30000Timeout per URL in milliseconds
llmBaseUrlstringOpenAI-compatible API base URL (required for "llm" mode)
llmModelstring"gpt-4o-mini"LLM model name
llmApiKeystringAPI key for the LLM service

Extraction Modes

CSS Mode ("css")

Rule-based extraction using CSS selectors. No external dependencies, no API keys.

Each key in selectors becomes a field in the output. The CSS selector is run against the page HTML, and the matched element's text content is extracted. If a schema is provided, values are coerced to the specified types.

LLM Mode ("llm")

AI-powered extraction using any OpenAI-compatible API. Requires llmBaseUrl and optionally llmApiKey. The page content (as Markdown) is sent to the LLM along with your schema and prompt.

Supports both Chat Completions (/v1/chat/completions) and Responses (/v1/responses) API formats — auto-detected from the URL.

Auto Mode ("auto", default)

Tries CSS extraction first (if selectors provided). If the result is less than 50% complete, falls back to LLM extraction (if credentials provided). If neither selectors nor LLM credentials are given, returns an error.

Examples

CSS Extraction

curl -X POST http://localhost:8080/api/v1/extract \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"],
    "mode": "css",
    "selectors": {
      "title": "h1",
      "price": "p.price_color",
      "availability": "p.availability",
      "description": "#product_description ~ p"
    },
    "schema": {
      "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"},
        "availability": {"type": "string"},
        "description": {"type": "string"}
      }
    }
  }'

LLM Extraction

curl -X POST http://localhost:8080/api/v1/extract \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/about"],
    "mode": "llm",
    "prompt": "Extract company information from this page",
    "schema": {
      "properties": {
        "companyName": {"type": "string"},
        "founded": {"type": "string"},
        "employees": {"type": "number"},
        "description": {"type": "string"}
      }
    },
    "llmBaseUrl": "https://api.openai.com",
    "llmModel": "gpt-4o-mini",
    "llmApiKey": "sk-..."
  }'

Python

import requests

response = requests.post("http://localhost:8080/api/v1/extract", json={
    "urls": ["https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"],
    "mode": "css",
    "selectors": {
        "title": "h1",
        "price": "p.price_color"
    },
    "schema": {
        "properties": {
            "title": {"type": "string"},
            "price": {"type": "number"}
        }
    }
})

data = response.json()
for item in data["data"]:
    print(f"{item['title']}: {item['price']}")

JavaScript

const response = await fetch("http://localhost:8080/api/v1/extract", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    urls: ["https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"],
    mode: "css",
    selectors: { title: "h1", price: "p.price_color" },
    schema: {
      properties: {
        title: { type: "string" },
        price: { type: "number" },
      },
    },
  }),
});

const { data } = await response.json();
console.log(data[0].title, data[0].price);

Response

{
  "success": true,
  "data": [
    {
      "title": "A Light in the Attic",
      "price": 51.77,
      "availability": "In stock",
      "description": "It's hard to imagine a world without A Light in the Attic..."
    }
  ]
}

Each element in data corresponds to one URL from the request. If a URL fails, its entry will contain an error field instead.

Schema Type Coercion

When a schema is provided with CSS extraction, values are coerced:

Schema TypeBehavior
"string"Raw text content (default)
"number" / "integer"Strips non-numeric chars, parses as float
"boolean""true", "yes", "1"true
"array"Collects all matching elements

On this page