Why Does My Scanned PDF Translate to Gibberish? (And How to Fix It)
Random symbols, blank pages, or 'PDF tools are temporarily offline' — the cause is almost always the same one thing, and it's fixable.
You upload a scanned document, ask a translator to convert it, and get back a mess: strings of random characters, boxes and symbols where words should be, a blank page, or an AI that just says "PDF tools are temporarily offline." The translation engine clearly didn't read your document — but the file looks completely normal to you. So what's going on?
The good news: this almost always comes down to a single cause, and once you understand it the fix is straightforward.
The one reason scanned PDFs translate to gibberish
A scanned PDF is an image — a photograph of a page saved inside a PDF. It has no text layer underneath, just pixels. When a translator opens it, there are no actual characters to read, so it either grabs random byte data and outputs noise, returns nothing, or throws an error.
Quick test: open your PDF and try to select a sentence with your cursor. If you can highlight individual words, it has a text layer and most translators will work. If your selection grabs the whole page like a picture (or nothing at all), it's a scan — and that's why translation produces gibberish.
OCR is the missing step
The fix is OCR (optical character recognition): software that looks at the image and works out what the characters are, turning the picture back into real text. Only after OCR does the document have something a translator can actually read. Tools that assume your PDF already contains text skip this step entirely — which is exactly why Adobe's and ChatGPT's basic flows often fail on scans, and why people end up trying tool after tool.
But OCR by itself isn't a guarantee. Even with OCR in the loop, two things still go wrong:
1. Poor OCR turns gibberish into different gibberish
If the scan is low-resolution, skewed, or busy with images, OCR misreads characters — merging words, dropping accents, confusing similar letters. The translator then faithfully translates the mistakes, so you get fluent-looking nonsense instead of random symbols. People still run into this in 2026: one user with an image-heavy TV manual reported the OCR result simply "wasn't good."
2. The text is read, but the layout is destroyed
Some tools recognize the text but then dump it into a plain block, collapsing tables and columns. Others do something worse: they turn your tables into flat images and paste translated text on top, so the result looks roughly right but isn't real, clean text — closer to a screenshot with captions than a translated document.
How to get clean translated text
Two parts: give OCR a good input, and use a workflow that does OCR, translation, and layout together.
- Scan at around 300 DPI — low resolution is the number-one cause of garbled output.
- Keep the page flat, straight, and well-lit — skew and shadows confuse recognition.
- Prefer clear printed text; handwriting is much harder for any OCR engine.
- Make sure accents and non-Latin scripts are sharp, since dropped marks change names and meanings.
Then, instead of stitching together a separate OCR tool and a translator and a reformatting pass, use a tool built for scanned PDFs end to end. Reglyph reads the scan with OCR, erases the original text from the page, translates it, and re-typesets the translation back into the same layout. The output is a finished, clean PDF — the translated words are real typeset text laid into the original positions, not an image with text pasted over it, and not a wall of garbled characters.
Upload the scan as-is (a PDF or even a phone photo) and download a ready-to-read translated PDF with the tables, figures, stamps, and numbers in place. No separate OCR step, no gibberish, no rebuilding the layout by hand. The first 3 pages are free to try.
Frequently asked
Because the PDF is a scanned image with no text layer, so the translator has no real characters to read and outputs noise. Running OCR first — or using a translator that OCRs scans automatically — produces real text instead.
General chat tools rely on a text layer or a separate file-reading service that can be unavailable or unable to read image-only scans. A tool purpose-built for scanned PDFs reads the image directly, so it doesn't depend on that.
OCR accuracy drops on low-resolution, skewed, or image-heavy scans. It misreads characters, and the translator then translates those mistakes. Improving scan quality (around 300 DPI, flat and straight) is the biggest fix.
With Reglyph the translation is real typeset text placed back into the original layout — not a flattened image with words pasted over it. You download a finished, clean translated PDF.