How to Translate a Scanned Document (PDF or Image), Step by Step
The popular advice — extract the text, copy it into ChatGPT, paste it back — works but wrecks your layout. Here are all three methods, honestly compared.
A scanned document is just a photo of a page — a contract you scanned, a certificate someone emailed you, a phone snapshot of a form. You can read it, but you can't select the text, and that's exactly why translating it is more awkward than it should be. Search for how to do it and you'll mostly find two kinds of advice: "extract the text and paste it into ChatGPT," or a forum thread that says "open the PDF, open Word, and retype it." Both work in a narrow sense — and both quietly destroy your formatting.
This guide lays out the three real ways to translate a scanned document (whether it's a PDF or an image), what each one costs you in time and layout, and how to pick. "Document" and "PDF" mean the same thing here — most scanned documents are saved as PDFs, and the steps are identical either way.
First, why you can't just translate it directly
A normal document created on a computer has a text layer — real, selectable characters underneath what you see. A scanned document has none. It's an image, so to any translator every word is just pixels. Before anything can translate it, software has to look at the picture and work out what the characters are. That step is OCR (optical character recognition), and it's the part most quick methods either skip or do badly.
Open your document and try to drag-select a sentence. If you can highlight individual words, it already has a text layer and a normal translator will handle it. If your cursor grabs the whole page like a picture, it's a scan — and it needs OCR first.
Method 1 — The ChatGPT / copy-paste route
This is the most-shared advice online: run the scan through an OCR tool to pull the text out, copy that text, paste it into ChatGPT (or DeepL, or Google Translate), and ask for a translation.
- 1OCR the scan. Use Acrobat's "Recognize Text", a free OCR site, or ChatGPT's file upload to turn the image into selectable text.
- 2Copy the recognized text. Paste it into your translator of choice and ask for the target language.
- 3Get plain translated text back. You'll have the meaning — as a flat block of text, with none of the original structure.
The catch is in step three. The moment the text leaves the page, the layout is gone: tables collapse into runs of words, columns merge, and stamps, signatures, and field labels lose their position. For a single paragraph where you only need the gist, that's fine. For a contract, certificate, or transcript — anything where which value sits in which box matters — you'll spend 30–60 minutes rebuilding the formatting by hand. (It's also why these how-to pages tend to have very short visit times: people try it, hit the reformatting wall, and leave.)
Method 2 — OCR into a Word file, then translate
A small step up: instead of copying raw text, export the OCR result to a Word document, then use Word's or Google Docs' built-in "Translate document" feature. This keeps simple paragraph formatting and is reasonable for a mostly-text page. But OCR-to-Word reconstruction is rough on anything structured — multi-column layouts, tables, and official forms still come out misaligned, and you're back to manual cleanup. It also depends entirely on the OCR being accurate; a low-resolution or skewed scan produces fluent-looking nonsense.
Method 3 — A one-step scanned-document translator
The reason the first two methods are painful is that they treat OCR, translation, and re-formatting as three separate chores you stitch together. A purpose-built tool does all three in one pass. Reglyph reads the scanned document with OCR, erases the original text from the page, translates it, and re-typesets the translation back into the exact same positions — so tables, columns, figures, stamps, and numbers stay where they were. You upload the scan as-is and download a finished, layout-correct document. No copy-paste, no Word round-trip, no rebuilding the formatting.
- 1Upload the scan as-is. A scanned PDF, or even a phone photo of the page. No text layer needed and no separate OCR step — the thing that stops ChatGPT and Google Translate is handled for you.
- 2It OCRs, translates, and re-typesets. The source text is recognized, translated to your target language, and laid back onto the page in place of the original.
- 3Download with the layout intact. Real typeset text in the original positions — not a flat block, not an image with words pasted on top. The first 3 pages are free.
Which method should you use?
- One short, plain-text page and you only need the meaning → the ChatGPT / copy-paste route is fine.
- A mostly-text page where rough formatting is acceptable → OCR into Word and translate there.
- A contract, certificate, transcript, or anything with tables, columns, or stamps → a one-step tool that preserves the layout will save you the most time and avoid manual rebuilding.
OCR quality decides everything downstream. Scan at around 300 DPI, keep the page flat and straight, prefer clear printed text over handwriting, and make sure accents and non-Latin scripts are sharp — dropped marks change names and meanings on official documents.
The short version
You can't translate a scanned document directly because it's an image with no text layer — it needs OCR first. The copy-paste-into-ChatGPT method everyone recommends gets you the words but throws away the layout, so you rebuild it by hand. If your document is structured or official, a tool that does OCR, translation, and re-typesetting in one step gives you back a clean, layout-correct document without the busywork.
Frequently asked
How can I translate a scanned document?
A scanned document is an image, so it needs OCR before it can be translated. You can OCR it yourself and paste the text into a translator (you'll lose the layout), or use a tool that OCRs, translates, and re-typesets the page in one step so the formatting is preserved.
Can Google Translate a scanned document?
Not directly. Google Translate's document upload only reads documents that already contain a text layer and doesn't OCR scans, so it returns a 'can't translate scanned PDFs' message. You'd need to OCR the document first, or use a translator built for scans.
How do I translate a scanned document without losing the formatting?
Use a translator that re-typesets the translation back onto the original page instead of dumping it into plain text. That keeps tables, columns, figures, and stamps in their original positions rather than collapsing them.
Can I scan a document and translate it in one go?
Yes. Upload a scanned PDF or a photo of the page to a tool built for scans — it runs OCR internally, translates, and rebuilds the layout, so you don't need a separate scan-then-translate workflow.
Is it free to translate a scanned document?
Manual OCR-and-paste methods are free but cost you time rebuilding the layout. Reglyph lets you translate the first 3 pages free with the layout preserved, so you can check the result before paying for longer documents.