If you’re building RAG pipelines, you’ve hit this wall: you need clean text from web pages, but 30% of them are JS-rendered SPAs that return empty markup. You don’t find out until your retrieval quality craters.
The UnWeb MCP server (@mbsoftsystems/unweb-mcp) gives Claude Code, Cursor, and Windsurf five tools for web-to-markdown conversion — and every response includes a content quality score (0–100) so your agent knows immediately whether the extraction worked.
Setup: 3 Lines of Config
Claude Code
Add this to ~/.claude/settings.json:
{
"mcpServers": {
"unweb": {
"command": "npx",
"args": ["-y", "@mbsoftsystems/unweb-mcp"],
"env": { "UNWEB_API_KEY": "unweb_your_key_here" }
}
}
}
Cursor
Same config in .cursor/mcp.json:
{
"mcpServers": {
"unweb": {
"command": "npx",
"args": ["-y", "@mbsoftsystems/unweb-mcp"],
"env": { "UNWEB_API_KEY": "unweb_your_key_here" }
}
}
}
Windsurf uses the same format. Get your API key at app.unweb.info — the free tier gives you 500 credits/month, no credit card required.
What You Get: 5 Tools
Once configured, your AI assistant has access to:
| Tool | What it does | Credits |
|---|---|---|
| convert_url | Convert any webpage URL to markdown | 1 |
| convert_html | Convert raw HTML string to markdown | 1 |
| crawl_start | Start crawling a docs site (path-bounded BFS) | 1/page |
| crawl_status | Check crawl progress | 0 |
| crawl_download | Download all crawled pages as markdown | 0 |
Every conversion response includes a quality score (0–100). The scorer checks content ratio, SPA framework markers (React, Next.js, Nuxt), script density, and semantic tag presence. A score below 40 means the page likely needs a headless browser — you know this before wasting tokens on garbage.
Use Case 1: Converting Docs for RAG Context
You’re working in Claude Code and need the API reference for a library. Instead of copy-pasting, just ask:
Convert https://docs.stripe.com/api/charges to markdown
Claude Code calls convert_url, gets clean CommonMark markdown with the quality score, and has the full reference in context. No browser switching. No manual formatting.
This is especially useful when you’re deep in a coding session and need to reference documentation for an unfamiliar library. The markdown output is clean — headers, code blocks, links all preserved — not raw text with no structure.
Use Case 2: Crawling an Entire Docs Site for Your Vector Store
This is where UnWeb pulls ahead of single-page converters. Say you’re building a support chatbot and need to ingest an entire documentation site:
Crawl https://docs.example.com starting from /guides/ — get all the pages under that path
Claude Code calls crawl_start with the URL and allowed path. The crawler runs a path-bounded BFS, staying within /guides/ and converting each page to markdown.
Your assistant can then check progress with crawl_status and download results with crawl_download — all pages returned as concatenated markdown with page separators:
--- Page: /guides/getting-started.md ---
# Getting Started
Content here...
--- Page: /guides/authentication.md ---
# Authentication
Content here...
The crawler natively exports LangChain JSONL and LlamaIndex JSON formats. Set exportFormat to langchain or llamaindex and skip the format wrangling between scraping and your vector store loader.
Why Not Just Use Firecrawl or Jina?
Both have MCP servers. Here’s where UnWeb differs:
| Feature | UnWeb |
|---|---|
| Content quality score | Every response includes a 0–100 quality score. Neither Firecrawl nor Jina tells you how well the extraction worked. When you’re processing thousands of pages, this saves you from polluting your vector store with skeleton HTML. |
| Pricing | Starts at $12/month (Starter, 2,000 credits) vs. Firecrawl at $16/month. Free tier: 500 credits/month — enough to evaluate properly before committing. |
| Native AI framework exports | The crawler outputs LangChain and LlamaIndex formats directly. No intermediate parsing step. |
| Broader tooling | Python SDK (pip install unweb), Node.js SDK, Go CLI, and browser extensions for Chrome and Firefox. Pick the interface that fits your workflow. |
Getting Started
- Get an API key at app.unweb.info (free, no credit card)
- Add the config to your Claude Code, Cursor, or Windsurf settings (see above)
- Start converting — ask your assistant to convert a URL or crawl a docs site
The MCP server runs via npx — no global install, no version management, always up to date.
Full documentation: docs.unweb.info
npm package: @mbsoftsystems/unweb-mcp
GitHub: github.com/mbsoft-systems/unweb-mcp