The fastest open-source web scraper for LLMs.
Distill the web.
Convert any web page into clean, LLM-ready Markdown. Built in Rust with intelligent HTTP-to-browser fallback. Self-hosted, no API keys, no rate limits.
Everything you need to scrape the web
Built for developers who need fast, reliable web data extraction for LLM pipelines and AI agents.
Two-Tier Rendering
Starts with a fast HTTP fetch. If content density is too low, automatically escalates to full Chromium browser automation. Speed when possible, reliability when needed.
LLM-Optimized Markdown
Strips navigation, ads, footers, cookie banners, and boilerplate. Preserves code blocks, heading hierarchy, and extracts metadata including OG tags.
Five Endpoints, One Server
Scrape a page, crawl a site, map URLs via sitemaps, search the web, or generate llms.txt files. All from a single lightweight server.
MCP Server for AI Agents
Built-in Model Context Protocol server. Connect Claude, Cursor, or any MCP client to give your agent web access.
Zero Dependencies
A single Rust binary. No Redis, no PostgreSQL, no external services. Run with cargo run or docker-compose up.
100% Open Source
MIT licensed. Self-hosted, no vendor lock-in, no API keys, no rate limits. Free forever.
URL to Markdown in milliseconds
Three steps from URL to LLM-ready Markdown.
Send a URL
POST to any of the 4 endpoints -- scrape, crawl, map, or search.
Smart rendering
HTTP first. If the page is a JavaScript SPA, automatic fallback to headless Chromium.
Get clean Markdown
Structured output with metadata, ready for your LLM pipeline.
Better output. Faster.
Benchmarked across 35 real-world URLs spanning 7 content categories against leading alternatives. Quality evaluated by LLM judge (Claude).
Quality (LLM Judge)
Per-category win rate across 35 URLs
| Category | Essence | Alternatives | Ties |
|---|---|---|---|
| Structured | 5/5 | 0/5 | 0 |
| News | 4/5 | 0/5 | 1 |
| Reference | 5/5 | 0/5 | 0 |
| Content | 5/5 | 0/5 | 0 |
| Dynamic | 4/5 | 1/5 | 0 |
| Docs | 5/5 | 0/5 | 0 |
| E-Commerce | 4/5 | 1/5 | 0 |
| Total | 32/35 | 2/35 | 2 |
Speed Comparison
Average response time by category
Benchmark conducted April 2026 against Firecrawl and Crawl4AI (both self-hosted via Docker). LLM judge evaluated content relevance, noise removal, readability, structural coherence, and information completeness. Full methodology
Why Essence
How Essence compares to other scraping tools. No spin -- just data.
| Feature | Essence | Alternatives |
|---|---|---|
| LLM-ready Markdown | ||
| Open source license | MIT | AGPL / Apache |
| Self-hosted | Single binary, zero deps | Redis + services / Docker |
| Browser fallback | Automatic (content-aware) | Manual / always-on |
| MCP server | Built-in | Separate package |
| API key required | Cloud tiers | |
| Rate limits | None | Tiered pricing |
| Quality (LLM judge) | 97% win rate | Best alternative: 26% |
| Median speed | 498ms | 908ms+ |
| Built-in search | DuckDuckGo | Varies |
| Language | Rust | TypeScript / Python |
| Pricing | Free forever | Free tier + paid |
Works with everything
A simple REST API. Use it from any language, any framework, or connect your AI agent via MCP.
curl -X POST http://localhost:8080/api/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'/api/v1/scrape/api/v1/crawl/api/v1/map/api/v1/search/api/v1/llmstxtReady to build?
Start getting Web Data for free and scale seamlessly. Self-hosted, no credit card needed.
git clone https://github.com/ruchit-p/essence.git
cd essence/backend
cp .env.example .env
cargo run --release