One HTTP endpoint turns a PDF, scan, screenshot, or video frame into Markdown plus layout regions, reading order, per-region confidence, and source bounding boxes. Free while in public beta — no API key required.
Install the client and parse a document in three lines. The Python and Node packages are both published as hardparse.
$ pip install hardparse # or: npm install hardparse
from hardparse import Hardparse
doc = Hardparse().parse("q3-report.pdf")
print(doc.markdown) # clean Markdown
print(doc.regions[0].bbox) # source bounding box of region 1
The public beta is open — anonymous requests work with no key, rate-limited per IP (see rate limits). If you have an API key, send it as a bearer token to lift the anonymous limit:
Authorization: Bearer YOUR_API_KEY
Send the file as multipart/form-data in a field named file. Everything else is an optional query parameter.
| Parameter | Type | Description | |
|---|---|---|---|
| file | file | required | The document to parse — multipart form field. PDF, image, or video. |
| format | string | optional | Output format: markdown (default), latex, or both. |
| verbose | boolean | optional | When true, each region gains a review field with per-word confidence and alternatives. Adds a few seconds per region. Default false. |
200 OK returns a JSON document. markdown is the full-document text; pages[] carries the structured layout, one entry per page.
{
"markdown": "# ACME Analytics\n## Quarterly Report …",
"page_count": 1,
"processing_time_ms": 412,
"ms_per_page": 412,
"pages": [
{
"page": 1,
"markdown": "# ACME Analytics …",
"image_width": 1654,
"image_height": 2339,
"processing_ms": 412,
"regions": [
{
"category": "title",
"score": 0.998,
"box": [32, 36, 488, 66],
"reading_order": 1,
"task": "text",
"result": { "markdown": "# ACME Analytics" }
}
]
}
]
}
Each region's box is [x1, y1, x2, y2] in pixels of the page image (image_width × image_height). category is the layout class (title, heading, text, table, figure, footer …), score is confidence in 0–1, and reading_order is the inferred sequence.
curl https://hardparse.com/v1/parse \ -F file=@q3-report.pdf \ -F format=markdown \ -o result.json
from hardparse import Hardparse
hp = Hardparse() # public beta, no key
doc = hp.parse("q3-report.pdf")
for r in doc.regions:
print(f"{r.order:02d} {r.category:<8} "
f"conf={r.confidence:.3f} bbox={r.bbox}")import { Hardparse } from "hardparse";
const hp = new Hardparse(); // beta
const doc = await hp.parse({ file: "./q3-report.pdf" });
console.log(doc.markdown);
console.log(doc.regions.length, "regions");Anonymous requests are limited to 100 documents per 24 hours per IP address. Every response carries the current budget:
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 87
When the limit is reached the API responds 202 Accepted — your file is saved and queued rather than rejected, and the body includes a job token you can poll. Need a higher limit? Email .
| Status | Meaning |
|---|---|
| 200 | Success — parsed document returned. |
| 202 | Rate limit reached — file accepted and queued; poll the job token. |
| 400 | Unsupported file type, or the upload was malformed. |
| 413 | File too large — the maximum is 200 MB. |
| 503 | The OCR pipeline is still warming up — retry in a few seconds. |
PDF · PNG · JPG · WebP · GIF · BMP · TIFF · WebM / MP4 (the first frame is parsed). Maximum file size 200 MB. Multi-page PDFs return one entry per page in pages[].