§ API · Reference

API reference

One HTTP endpoint turns a PDF, scan, screenshot, or video frame into Markdown plus layout regions, reading order, per-region confidence, and source bounding boxes. Free while in public beta — no API key required.

quickstart authentication request response examples rate limits errors supported inputs

Quickstart

Install the client and parse a document in three lines. The Python and Node packages are both published as hardparse.

$ pip install hardparse        # or:  npm install hardparse
from hardparse import Hardparse

doc = Hardparse().parse("q3-report.pdf")
print(doc.markdown)            # clean Markdown
print(doc.regions[0].bbox)     # source bounding box of region 1

Authentication

The public beta is open — anonymous requests work with no key, rate-limited per IP (see rate limits). If you have an API key, send it as a bearer token to lift the anonymous limit:

Authorization: Bearer YOUR_API_KEY

Request

POST https://hardparse.com/v1/parse

Send the file as multipart/form-data in a field named file. Everything else is an optional query parameter.

ParameterTypeDescription
file file required The document to parse — multipart form field. PDF, image, or video.
format string optional Output format: markdown (default), latex, or both.
verbose boolean optional When true, each region gains a review field with per-word confidence and alternatives. Adds a few seconds per region. Default false.

Response

200 OK returns a JSON document. markdown is the full-document text; pages[] carries the structured layout, one entry per page.

{
  "markdown":            "# ACME Analytics\n## Quarterly Report …",
  "page_count":          1,
  "processing_time_ms":  412,
  "ms_per_page":         412,
  "pages": [
    {
      "page":          1,
      "markdown":      "# ACME Analytics …",
      "image_width":   1654,
      "image_height":  2339,
      "processing_ms": 412,
      "regions": [
        {
          "category":      "title",
          "score":         0.998,
          "box":           [32, 36, 488, 66],
          "reading_order": 1,
          "task":          "text",
          "result":        { "markdown": "# ACME Analytics" }
        }
      ]
    }
  ]
}

Each region's box is [x1, y1, x2, y2] in pixels of the page image (image_width × image_height). category is the layout class (title, heading, text, table, figure, footer …), score is confidence in 0–1, and reading_order is the inferred sequence.

Code examples

curl https://hardparse.com/v1/parse \
  -F file=@q3-report.pdf \
  -F format=markdown \
  -o result.json
from hardparse import Hardparse

hp  = Hardparse()                       # public beta, no key
doc = hp.parse("q3-report.pdf")

for r in doc.regions:
    print(f"{r.order:02d}  {r.category:<8}  "
          f"conf={r.confidence:.3f}  bbox={r.bbox}")
import { Hardparse } from "hardparse";

const hp  = new Hardparse();            // beta
const doc = await hp.parse({ file: "./q3-report.pdf" });

console.log(doc.markdown);
console.log(doc.regions.length, "regions");

Rate limits

Anonymous requests are limited to 100 documents per 24 hours per IP address. Every response carries the current budget:

X-RateLimit-Limit:     100
X-RateLimit-Remaining: 87

When the limit is reached the API responds 202 Accepted — your file is saved and queued rather than rejected, and the body includes a job token you can poll. Need a higher limit? Email .

Errors

StatusMeaning
200Success — parsed document returned.
202Rate limit reached — file accepted and queued; poll the job token.
400Unsupported file type, or the upload was malformed.
413File too large — the maximum is 200 MB.
503The OCR pipeline is still warming up — retry in a few seconds.

Supported inputs

PDF · PNG · JPG · WebP · GIF · BMP · TIFF · WebM / MP4 (the first frame is parsed). Maximum file size 200 MB. Multi-page PDFs return one entry per page in pages[].

Try it in the browser → Pricing