AI PDF Data Extractor — extract any fields to JSON (API + CLI)
About this tool
Turn any PDF into clean, structured JSON — you define the fields, the AI extracts them. Point it at a PDF and pass a simple schema ("invoice_number", "buyer_name", "total", "due_date", ...). It reads fillable PDF form fields directly and falls back to AI vision for flat/scanned PDFs, then returns a predictable JSON object with every field you asked for (missing values come back as null, so your downstream code never breaks). Use it two ways: - REST API: POST a PDF + schema, get JSON back. Drop it into any app or automation. - CLI: node extract.js invoice.pdf schema.json — for quick local runs and batch jobs. WHAT YOU GET - Full source code (Node.js + Express, self-contained) - Both the API server and the CLI - Documented .env, README, and an example schema to copy - Robust JSON parsing with automatic retry built in TECH Node.js + Express, pdf-lib for form-field reading, Anthropic Claude for extraction. No database required. Schema is flexible: pass a list of field names, objects with types/descriptions, or a name→description map — whatever's convenient.
Good to know
Good to know: - It's a generic extractor: accuracy depends on the PDF. Clean, text-based or fillable PDFs work best; heavily scanned/handwritten docs are harder. - Bring your own Anthropic API key (free to create; pay-per-use, cents per document). - This is a developer tool (API + CLI), not a no-code dashboard — you call it from your own app or terminal.
Charged in EUR