Most OCR tools run one model over the entire page.
We run two. First, a layout model detects what's where.
Then a vision-language model reads each region separately.
Tables stay tables. Handwriting stays handwriting.
01
Upload
POST a PDF, PNG, JPG, TIFF, or HEIC. Up to 20MB per file.
02
Detect regions
Layout model finds text blocks, tables, formulas, figures, handwriting. Each gets a bounding box.
03
Read and return
VLM reads each region with the right context. You get Markdown + JSON with confidence scores.
/ integration
One endpoint.
curl-X POST https://api.hardparse.com/v1/parse \
-H"Authorization: Bearer hp_your_key" \
-F"file=@invoice.pdf"
const form = newFormData();
form.append('file', file);
const res = awaitfetch('https://api.hardparse.com/v1/parse', {
method: 'POST',
headers: { 'Authorization': `Bearer ${apiKey}` },
body: form,
});
const { markdown, regions } = await res.json();
/ capabilities
What it handles.
Scanned documents
Phone photos, faxes, 300dpi scans. The kind of files your users actually upload.
Tables
Rows, columns, cells. Output is a Markdown table you can parse or render.
Handwriting
Notes, filled-in forms, margin annotations. No templates.
Math
LaTeX output. Fractions, integrals, matrices.
Bounding boxes
Every region: type, coordinates, confidence score. Reading order preserved.