← All guides

How to Translate a Scanned PDF Without Losing the Layout

Why image-only PDFs break normal translators — and the workflow that keeps your tables, columns, and stamps exactly where they were.

·7 min read

You drop a scanned contract into a translator, hit go, and get back one of three things: a blank page, a wall of garbled characters, or text that's been translated but with every table, column, and signature line scrambled out of place. If you've hit this, you're not doing anything wrong — most translation tools simply aren't built for scanned documents.

This guide explains exactly why scanned PDFs break, walks through the realistic ways to translate one while preserving the layout, and covers how to get a clean result on official documents like certificates, contracts, and transcripts.

Why a scanned PDF isn't like a normal PDF

A regular, digitally-created PDF has a text layer underneath what you see — the actual characters, which software can select, copy, and translate. A scanned PDF has no text layer at all. It's a photograph of a page wrapped in a PDF container. To a translator, every word is just pixels in an image.

That single difference is the root of nearly every problem people run into:

  • Gibberish or random characters — the tool tries to read a text layer that doesn't exist and outputs noise.
  • "No text found" or "text not recognized" errors — the file is skipped entirely, or pages silently don't appear.
  • AI assistants that quietly do nothing — many can only read the text layer, so they fail on scans without telling you why (sometimes only after you've paid).
  • Destroyed layout — even when the text is read, tables collapse, columns merge, and headings blend into the body.
The missing step is OCR

OCR (optical character recognition) is what turns the picture of text back into real text. For a scanned PDF it isn't optional — it's the step that has to happen before any translation can work. Tools that assume your PDF already has a text layer skip it, which is why they fail.

Why the layout breaks even when the text is read

Getting the words out is only half the job. The harder half is putting the translation back where it belongs. Translated text is rarely the same length as the original — English is often longer than Chinese or shorter than German — so naive tools either overflow the page or dump everything into a single plain column. Tables lose their cells, multi-column layouts merge into one, page breaks land in the wrong place, and stamps and signatures get pushed around or dropped.

For an official document, that's not a cosmetic issue. A birth certificate, a transcript, or a contract is only useful if the structure survives — the reader needs to see which value sits in which field. This is the step where most workflows fall apart, and it's why people report spending 30–60 minutes per document manually copying the formatting back by hand.

Three ways to translate a scanned PDF (and when to use each)

1. The manual route: OCR first, then translate

Run the scan through a dedicated OCR tool (Adobe Acrobat's "Recognize Text", or a free OCR site) to produce a searchable PDF or a Word file, then paste the recognized text into a translator like DeepL or Google Translate. This works, but you lose the layout the moment you copy the text out, so you'll be rebuilding tables and structure yourself afterwards. Fine for a single short page where you just need the meaning; painful for anything structured or multi-page.

2. Acrobat or an all-in-one PDF suite

Some PDF suites bundle OCR and translation. The catch users repeatedly hit: the translate or AI feature only works once the file has a text layer, so you still have to run "Recognize Text" first, and the re-typeset result often still needs formatting cleanup. Reasonable if you already own the suite and your document is simple.

3. A purpose-built scanned-PDF translator (one pass)

The reason a tool like Reglyph exists is to collapse all of the above into a single step. It treats image-only PDFs as the normal case, not an unsupported edge case: it OCRs the scan, erases the original text from the page, translates it, and re-typesets the translation back into the same position — so tables, columns, figures, stamps, and numbers stay put. You upload the scan as-is and download a finished PDF; there's no separate OCR step and no manual reformatting.

  1. 1Upload the scan as-is. A scanned PDF or even a phone photo of the page is fine. No text layer needed, no pre-conversion.
  2. 2It reads and translates. OCR extracts the source text, it's translated to your target language, and the original text is cleanly erased from the page.
  3. 3Download with the layout intact. You get a PDF where every table, figure, stamp, and number sits exactly where it was — only the words changed language.

How to get the cleanest result

Whatever route you choose, OCR accuracy depends on the input. A few things make a real difference:

  • Scan at around 300 DPI — low-resolution scans are the #1 cause of garbled output.
  • Keep the page flat and straight — skew and shadows confuse character recognition.
  • Prefer clear printed text — handwriting is much harder for any OCR engine and results will vary.
  • Make sure accents and non-Latin scripts are sharp — dropped accents change names and meanings on official documents.
About official / certified use

An automated translation gets the content and layout right, which is most of the work. But for legal submissions (e.g. USCIS / immigration), a certified human translator still has to sign a certification statement. A common, efficient pattern is to use an automated tool to produce an accurate, layout-correct draft first, then have it certified.

The short version

Scanned PDFs fail in normal translators because they're images with no text layer, so the tool has nothing to read — and even when OCR is added, putting the translation back without wrecking the layout is its own hard problem. The clean fix is a workflow that does OCR, translation, and re-typesetting together. If you're translating structured or official documents and don't want to rebuild the formatting by hand, a purpose-built scanned-PDF translator will save you the most time.

Frequently asked

Why does my scanned PDF translate to gibberish?

Because a scanned PDF is an image with no text layer. Translators that expect selectable text read nothing and output noise. Running OCR first — or using a tool that OCRs the scan automatically — fixes it.

Can I translate a scanned PDF without OCR-ing it first myself?

Yes, if you use a translator built for scans. It runs OCR internally, so you upload the image-only PDF as-is and get translated text back without a separate conversion step.

How do I keep tables and formatting when translating a PDF?

Use a tool that re-typesets the translation back onto the original page rather than dumping it into plain text. That keeps tables, columns, figures, and stamps in their original positions instead of collapsing them.

What scan quality do I need for good translation?

Aim for about 300 DPI, a flat and straight page, and clear printed text. Low resolution, skew, and handwriting are the main causes of poor OCR and garbled translations.

Translate a specific language

Choose your plan

Simple, scan-friendly pricing. Pages are pages — no multiplier for scanned files.

Pay-as-you-go
Translate a few pages. No commitment.
$5/ one-time
What's included:
  • $0.50 / page
  • 10 scanned pages, never expire
  • Up to 100 MB per file
Buy a pack
Most popular
Starter
For individuals with scans to translate every month.
$15/ per month
What's included:
  • $0.13 / page
  • 120 pages every month
  • Up to 100 MB per file
  • Pages reset each month
Start Starter
Pro
For high-volume users and whole-book scans.
$39/ per month
What's included:
  • $0.06 / page
  • 700 pages every month
  • Unused pages roll over (up to 2,100)
  • Big files up to 300 MB
Start Pro