The fastest open-source web scraper for LLMs.
Distill the web.
Convert any web page into clean, LLM-ready Markdown. Built in Rust with intelligent HTTP-to-browser fallback. Self-hosted, no API keys, no rate limits.
Everything you need to scrape the web
Built for developers who need fast, reliable web data extraction for LLM pipelines and AI agents.
Two-Tier Rendering
Starts with a fast HTTP fetch. If content density is too low, automatically escalates to full Chromium browser automation. Speed when possible, reliability when needed.
LLM-Optimized Markdown
Strips navigation, ads, footers, cookie banners, and boilerplate. Preserves code blocks, heading hierarchy, and extracts metadata including OG tags.
Six Endpoints, One Server
Scrape, crawl, map, search, extract structured data, or generate llms.txt files. All from a single lightweight server with an OpenAPI spec.
MCP Server for AI Agents
Built-in Model Context Protocol server. Connect Claude, Cursor, or any MCP client to give your agent web access.
Zero Dependencies
A single Rust binary. No Redis, no PostgreSQL, no external services. Run with cargo run or docker-compose up.
Structured Extraction
Extract structured JSON from pages using CSS selectors or LLM-based extraction. Define a schema, get typed data back. No more regex scraping.
URL to Markdown in milliseconds
Three steps from URL to LLM-ready Markdown.
Send a URL
POST to any of the 4 endpoints -- scrape, crawl, map, or search.
Smart rendering
HTTP first. If the page is a JavaScript SPA, automatic fallback to headless Chromium.
Get clean Markdown
Structured output with metadata, ready for your LLM pipeline.
Better output. Faster.
Benchmarked across 35 real-world URLs spanning 7 content categories against leading alternatives. Quality evaluated by LLM judge (Claude).
Quality (LLM Judge)
Per-category win rate across 35 URLs
| Category | Essence | Alternatives | Ties |
|---|---|---|---|
| Structured | 5/5 | 0/5 | 0 |
| News | 4/5 | 0/5 | 1 |
| Reference | 5/5 | 0/5 | 0 |
| Content | 5/5 | 0/5 | 0 |
| Dynamic | 4/5 | 1/5 | 0 |
| Docs | 5/5 | 0/5 | 0 |
| E-Commerce | 4/5 | 1/5 | 0 |
| Total | 32/35 | 2/35 | 2 |
Speed Comparison
Average response time by category
Benchmark conducted April 2026 against Firecrawl and Crawl4AI (both self-hosted via Docker). LLM judge evaluated content relevance, noise removal, readability, structural coherence, and information completeness. Full methodology
Why Essence
How Essence compares to other scraping tools. No spin -- just data.
| Feature | Essence | Alternatives |
|---|---|---|
| LLM-ready Markdown | ||
| Structured extraction | CSS + LLM hybrid | LLM only |
| Open source license | MIT | AGPL / Apache |
| Self-hosted | Single binary, zero deps | Redis + services / Docker |
| Browser fallback | Automatic (content-aware) | Manual / always-on |
| MCP server | Built-in | Separate package |
| OpenAPI spec | ||
| Official SDKs | Python, TypeScript | Python, JS, Go, Rust |
| API key required | Cloud tiers | |
| Rate limits | None | Tiered pricing |
| Quality (LLM judge) | 97% win rate | Best alternative: 26% |
| Median speed | 498ms | 908ms+ |
| Built-in search | DuckDuckGo | Varies |
| Pricing | Free forever | Free tier + paid |
Works with everything
A simple REST API. Use it from any language, any framework, or connect your AI agent via MCP.
curl -X POST http://localhost:8080/api/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'/api/v1/scrape/api/v1/crawl/api/v1/map/api/v1/search/api/v1/extract/api/v1/llmstxtReady to build?
Start getting Web Data for free and scale seamlessly. Self-hosted, no credit card needed.
git clone https://github.com/ruchit-p/essence.git
cd essence/backend
cp .env.example .env
cargo run --release