Grounding vs Scraping: Why Your Agent Shouldn't Parse HTML
The real token and accuracy cost of scraping-based agents, with the maths. When scraping is still right, and when typed grounding wins outright.
The scraping tax nobody budgets for
Ask an agent team how much their scraping pipeline costs and you’ll get a number for the scraper. Ask how much it costs the agent — in tokens, latency, and quietly wrong answers — and you’ll get a shrug. That number is almost always larger than the infrastructure bill, and it’s the one that shows up in your eval scores.
This post is the argument, with numbers, for treating scraped HTML as a raw material rather than an input. If you’re running Firecrawl, Jina Reader or a homegrown Playwright fleet piped directly into a model, the maths below is for you.
What hallucination actually costs per 1k tool calls
We ran an internal benchmark across three agent configurations doing the same task: “find the current unit price for gas in Greater London and cite it.” 1,000 runs each. Claude Opus 4.7, identical prompts, different tool backends.
| Backend | Avg tokens in | Avg tokens out | Accuracy | Cost / 1k calls |
|---|---|---|---|---|
| Scrape + markdown | 7,400 | 180 | 71% | £27.40 |
| Scrape + extraction prompt | 7,400 | 320 | 84% | £29.10 |
| Typed grounding API | 280 | 160 | 98% | £1.90 |
The accuracy gap is the one most teams optimise for. The cost gap is the one that changes what your agent can do economically. At 7k input tokens per call, an agent that makes six tool calls per turn is burning 42k tokens of context on context, before it reasons about anything.
The 14% of “scrape + extraction” calls that got the wrong answer didn’t fail loudly. They returned a confident number from a comparison-site promotional banner instead of the actual tariff. That’s the real scraping tax: you cannot tell from the output whether the agent read the right paragraph.
Why “parse these fields from this HTML” doesn’t work
The standard scraping-based agent pattern looks like this:
# Don't do this in production
html = firecrawl.scrape(url)
prompt = f'''Extract the following fields from this page as JSON:
- unit_price (pence per kWh)
- standing_charge (pence per day)
- tariff_name
Page:
{html}'''
result = llm.complete(prompt)
Three things go wrong, reliably:
The page isn’t the data. A supplier’s tariff page contains 40 tariffs, promotional banners, and a cookie modal rendered as text. The model has to decide which number is the number. It guesses.
Token budgets go to formatting. Scraped markdown for a typical UK energy supplier page is 6-9k tokens. Maybe 200 of those are the answer. You’re paying a 30× markup to let the model do what a CSS selector plus a typed schema could do deterministically upstream.
There’s no cache. Two agents asking the same question ten seconds apart both re-scrape, re-tokenise, re-extract. The determinism of the final answer is at the mercy of whatever the page looked like at the moment of fetch, including A/B tests the site is running on you.
The maths on cached typed JSON
Here’s the same task with a grounding API:
const result = await freshgeo.pricing.get({
product: 'gas',
region: 'GB-LND',
});
// result.values[0] => {
// tariff: 'Standard Variable',
// unit_price: 6.04,
// standing_charge: 31.20,
// source: { publisher: 'Ofgem', url: '...', fetched_at: '2026-04-24T08:00:00Z' },
// confidence: 0.99
// }
// result.cache_id => 'prc_gb_lnd_gas_2026w17_a81f'
Input to the model: roughly 280 tokens of structured JSON, including provenance. The model doesn’t have to decide which number is the number — the API did that, once, deterministically, against a source we trust (Ofgem’s published cap, not a supplier’s marketing page).
The second agent to ask the same question in the same cache window gets the same cache_id, which we de-duplicate server-side at zero token cost to you. Our customers running high-volume lead enrichment see 60-80% cache hit rates in steady state.
When scraping is still right
We are not zealots about this. Scraping is the correct choice in three situations:
- Long-tail, low-volume sources. If you need to read one specific PDF from one council website once a quarter, building a typed tool is absurd. Scrape it.
- Genuinely unstructured content. Press releases, blog posts, customer reviews. If the thing you want is the prose, not a field from the prose, a scraper piping into a summarisation prompt is fine.
- Research agents with human review. If a human is reading every output before it hits a customer, the accuracy floor is higher. Noise is tolerable.
The failure mode is using scraping for high-volume, structured, production-critical lookups where the data has an obvious canonical form. That’s the 80% of agent tool calls we see, and it’s where grounding wins outright.
How FreshGeo approaches it
Under the bonnet, yes, we scrape. We also call official APIs, ingest regulator publications, buy licensed feeds, and run a non-trivial amount of entity resolution. The point is that we do it once, pay for the compute and the extraction once, and return the answer as typed JSON with a cache_id and an evidence URL.
Our seven tools cover pricing, intent, jobs, social, real estate, competitor monitoring and news/risk. Each one replaces a category of scraping your agent would otherwise be doing in-prompt.
If your agent is parsing HTML in its context window, you are paying for that parse every time, in tokens and in accuracy. Move it upstream. Your evals will thank you — and so will your agent build budget.