How to extract text from a PDF
Drag a PDF onto the drop zone or click to choose a file. The text is extracted page by page on your device and appears in the box, with a live count of pages, words and characters. Enter a page range to limit which pages are read, toggle whether to keep the document's line breaks, and add page separators if you want each page marked. When you are done, copy the text or download it as a .txt file. Because everything runs in your browser, even a large or confidential PDF is processed without an upload.
What the text layer is
A PDF stores text as a text layer: the actual characters, positioned on the page. This tool reads that layer directly using Mozilla's pdf.js — the same open-source engine that powers PDF viewing in Firefox — so the extracted text is exactly what the document contains, not a guess. Most PDFs created from a word processor, browser or design tool carry a full text layer, which is why their text is selectable in a normal PDF viewer. If you can select and copy text inside the original PDF, this tool can extract it.
Scanned PDFs and OCR
A PDF made by scanning paper, or by exporting an image, has no text layer — each page is a picture of text. There are no characters to read, so extraction returns nothing and the tool tells you the document looks scanned. Turning a picture of text back into characters requires OCR (optical character recognition), which is a different process and is not performed here. If you need OCR, use a dedicated tool; this extractor is for PDFs that already contain real text, and it stays fast and fully local precisely because it does no image recognition.
Page ranges and layout
The page field accepts single pages, ranges and open-ended ranges combined with commas — for example 1-3,5,8- reads pages 1 through 3, page 5, and page 8 to the end. Leave it blank for the whole document. Keep layout breaks uses each text item's position to reconstruct line and paragraph breaks, which suits prose and reports; turn it off to collapse everything into space-separated words, which is often better when you will re-chunk the text for embeddings. Page separators insert a labelled marker between pages so you can tell where each one starts.
Why extract PDF text locally
PDFs are full of sensitive material — signed contracts, invoices, medical and legal documents, internal reports. Uploading one to a conversion website hands a third party the whole file. Running pdf.js in your browser keeps the document on your device: it is read with the local FileReader, parsed in memory, and nothing is logged or transmitted, matching the gitime.dev default that your data stays local.
- Local — the PDF is never uploaded.
- Accurate — reads the real text layer via pdf.js.
- Selective — extract any page range.
- Layout-aware — keep or flatten line breaks.
- Exportable — copy or download as .txt.
Frequently asked questions
- Is my PDF uploaded to a server?
- No. It is read locally with FileReader and parsed by pdf.js in your browser, so nothing is uploaded.
- Why does my scanned PDF produce no text?
- Scanned PDFs are images with no text layer; extracting them needs OCR, which this tool does not do.
- Can I extract only certain pages?
- Yes — enter a range like 1-3,5,8- in the Pages field. Blank means the whole document.
- Does it work on password-protected PDFs?
- Encrypted PDFs must have the password removed first; the tool reports when a file is protected.