HHardparseOCR · v4.2
K
SOTA extraction · v4.2 release

OCR that gets harder
the harder it gets.

Most OCR gives up on smudged handwriting, crumpled receipts, and columns that don't sit still. Hardparse retries the hard parts with more context until the output is clean — and shows you the reasoning it used to get there.

accuracy98.4% CER on IAM · +4.1 vs prior SOTA
latency1.2s p50 · 3.8s with hard-mode retries
languages132 · scripts incl. Devanagari, Arabic, Hangul
formatsPNG · PDF · TIFF · HEIC · live camera
shippedApr 2026
hardparse.ai / playground / handwritten/consult-note-043.heic
hp-vision-4.2pass 2 / 3req_01HQ9F…
sourceregionsdeskewed
+
Northgate Family Clinic
412 Linden St · Portland, OR 97204 · (503) 555-0188
Pt: MORENO, E.DOB: 04/11/198204/17/26
Pt presents with persistent cough x 3 wks,
worsening at night. No fever. Denies
hemoptysis. Chest clear bilaterally.

Plan: start Azithromycin 250mg PO
qd x 5d. F/U in 10 days if
sx persist. RTC sooner PRN.
E. Asare, MD
DEA · AS8817234
scale 1:1DPI 300HEIC · 2.1mbdeskew +1.4°
7 regions detected · 3 low-confidenceclick any region to inspect →
01raw02contextual03verified
+1.8s retry
>9580–95<80
— letterhead
Northgate Family Clinic
412 Linden St · Portland, OR 97204 · (503) 555-0188
Pt: MORENO, E.
DOB 1982-04-11
2026-04-17
— narrative · conf 92.1%
01Pt presents with persistent cough x 3 wks,
02worsening at night. No fever. Denies
03hemoptysis. Chest clear bilaterally.
— plan · structured
drugAzithromycindose250 mgroutePOsigonce daily × 5 daysfollowup10 daysdirectiveRTC → "return to clinic"
— signed
Dr. E. Asare, MDDEA AS8817234 ✓
17 tokens · 15 high · 2 resolved via retryclick a dashed token to inspect

Most OCR guesses once.
Hardparse argues with itself.

Every token gets a confidence score. Anything under 90% triggers a second pass with a widened context window — the surrounding lines, document type, domain lexicon, and visual similarity to training glyphs. If it still won't resolve, we flag it for human-in-the-loop with the top three candidates and the reasoning behind them. You see every step of the argument, not just the verdict.

01 Read
First pass.
Dense layout + token read in one shot. Every token scored.
02 Retry
Widen the window.
Low-confidence regions re-read with ±3 words of context and a domain lexicon.
03 Verify
Prove it.
Structural checks (sums, DEA format, ISO dates) catch the last 0.3%.
character error rate (lower = better)2026 Q1
hardparse v4.21.6%
competitor A5.7%
competitor B7.1%
open-source X9.2%
open-source Y12.0%
n=1,539 lines · cross-writer splitmethodology →
structured extraction · receiptsf1
hardparse0.967
competitor A0.823
competitor B0.780

One call.
Everything you wanted.

Stream tokens as they resolve, or get the final structured JSON. Every response includes the agent trace and per-token confidences. Retries are transparent — pay only for tokens emitted, not for retries.

STREAMING
Tokens arrive in ~80ms each.
STRUCTURED
Schemas for 40+ doc types.
TRACE
Every retry is auditable.
HIL
Unresolved → your queue.
curlpythonnodego
POST /v1/parse
# Parse a tough doc, stream tokens as they resolve
curl https://api.hardparse.ai/v1/parse \
  -H "Authorization: Bearer $HARDPARSE_KEY" \
  -F file=@consult-note.heic \
  -F schema="medical.note" \
  -F retry="aggressive" \
  -F stream=true

> event: token
> data: {"t":"worsening","conf":0.96,"pass":2}
> event: resolved
> data: {"id":"t2","from":"wkening","to":"worsening","delta":+26}
> event: complete
> data: {"conf":0.978,"retries":3,"ms":1842}
hard cases we specialize in
01hw
Clinical notes
avg 97.2%
02thermal
Crumpled receipts
avg 96.1%
03wb
Whiteboards / photos
avg 94.8%
04form
Forms w/ checkboxes
avg 99.1%
05arch
Historical scans
avg 93.4%

Ship it on the hard 3%.

Most docs parse cleanly on the first try. Hardparse exists for the rest — the ones that make your confidence interval wince. 5,000 free pages, no credit card.

SOC 2 · HIPAA · EU-resident