Copying Hebrew text from a PDF into a Translation tool - OCR (Optical Character Recognition) Help Request #general


Joyaa Antares
 

Hi Folks,
I have two medium-sized documents detailing my family ancestry.  Both documents are in PDF format, and both are in Hebrew.
I want to be able to copy the text into a good Hebrew-English translation tool in order to be able to understand and verify what has been written.  Unfortunately, these PDFs don't allow text to be "selected" in order to copy/paste but perhaps it could be done with a suitable OCR or Optical Character Recognition tool that would render the text "copyable"?  Does anyone know?  Does anyone have access to such a tool who could help me with this?
Thank you very much.
Joyaa ANTARES
Gold Coast, Queensland, Australia
___________________________
Researching   SCHORR, SCHERZER, JURIS and DAWID in Buckaczowce, Ottynia, Nadworna, and Kolomyya


Avraham Y. Kahana
 

Hi Joyaa, 
one thing I did in the past was to use free site file converters such as https://convertio.co
I would convert from PDF to HTML, making copying the text an easy task. I had good results with the resulting html formatting.

Regards,
Avraham Kahana


David Lewin
 

At 10:25 06/04/2021, Joyaa Antares wrote:

Hi Folks,
I have two medium-sized documents detailing my family
ancestry. Both documents are in PDF format, and both are in Hebrew.
I want to be able to copy the text into a good Hebrew-English
translation tool in order to be able to understand and verify what
has been written. Unfortunately, these PDFs don't allow text to be
"selected" in order to copy/paste but perhaps it could be done with
a suitable OCR or Optical Character Recognition tool that would
render the text "copyable"? Does anyone know? Does anyone have
access to such a tool who could help me with this?
Thank you very much.
Joyaa ANTARES
Gold Coast, Queensland, Australia
___________________________
Researching SCHORR, SCHERZER, JURIS and DAWID in Buckaczowce,
Ottynia, Nadworna, and Kolomyya

If not resolved by now, send it to me and I will transpose it for you

David Lewin
London

Search & Unite attempt to help locate people who, despite the passage
of so many years since World War II, may still exist "out there".
We also assist in the process of re-possession of property in the
Czech Republic and Israel.
See our Web pages at https://remember.org/unite/


Peter Straus
 

Adobe offers a tool to edit pdfs, including the ability to copy text in a form that can be dropped into other apps.  I haven’t tried Hebrew, but it does include language options.  I’m paying $14.99 a month, but I use it a lot.  There are probably other pricing options, and possibly other similar products.

--peter straus

   San Francisco


garybinetter@...
 

Hi Joyaa,
I live in Israel. My Hebrew is very poor. Reading mail was a challenge but since I have Google Translate on my phone, I no longer run away screaming when the mail arrives. I would be lost without Google Translate. It can be used on a computer but I find it gives better results via the phone.

Open the app on your phone. Make sure you have selected Hebrew to English. touch Camera then Scan. Get the section you want to translate in the frame then take the picture. The app will quickly scan the document. When it is finished touch all the words or the section you want to translate. If it is a large document just repeat the process, section by section.

Regards,
Gary Binetter garybinetter@...
Tel Aviv, Israel

Researching BANET BENET BINETH BINETTER BECK BAUMHORN LICHTENSTEIN PICK


r.peeters
 

Hi Joyaa,

I posed the same question not too long ago about documents with Kurrent characters and Logan Kowacks informed me that this doesnot  work, I think  it may be the same with Hebrew? 
Bye,
Ron Peeters(NL)


de.ewenczyk@...
 

I think that Gary Binetter’s solution is a very practical solution. 

However, I tested it on my psalm book , with printed characters. The Hebrew is read with a medium accuracy of about 80%. This causes the translation to be very poor, almost illegible. 

Depending upon the original document quality, the result is unpredictable, but very much worth trying. 

If we are dealing with a manuscript, it’s a lost game.

Daniel Ewenczyk
Paris, France
Searching Evenchi(c)k, Ewenczyk (Minsk, Belarus) / Receptor, Retzepter (Luboml, Ukraine).


meirr@...
 

Hi All,
Another  solution: Open the PDF in Adobe Acrobat Reader (a free program) and under the "File" menu - "save as text"
--
Meir Razy
meir.razy@...
Searching:
Kisfajn / Sfard / Rothenberg / Ruttenberg / Rojtenberg in Rovno,Volhynia
Ross in Dubno,Volhynia


Dahn Cukier
 

Hi.

After you  read the following and do not understand, I will answer questions off-list,
I work in this 30 years, I do not expect everyone to understand in a few words.

There are 2 ways to create a PDF file. Enter data and save as PDF, scan
or  take a photo and create the photo as PDF.

As far as I know the second way will not be able to convert to text.

The first way. if a document was written in a word processor
spreadsheet or any application that saves characters and saved/exported as PDF,
then I can try converting  it in Linux operating system using "pdftotext". I do not know if the
software is available in Windows or iOS.

There is a second potential problem. Only in the last few years has there been a
"standard" that most software displays Hebrew. I think different fonts display Hebrew
using different binary codes.

If you can translate the PDF to text and it looks like encoded spy message,
I can attempt to translate this to other fonts using a program I wrote many years ago
in Regina/REXX.


If I can help anyone, please send me off-list using the subject field "help translate".

Dahn Zukrowicz
Cukier, Zucker, Brieff, Brif, Liss, Lisobitsky, Sabath, Sklawer,


When you start to read readin,
how do you know the fellow that
wrote the readin,
wrote the readin right?

Festus Hagen
Long Branch Saloon
Dodge City, Kansas
(Gunsmoke)


On Tuesday, April 6, 2021, 1:06:01 PM GMT+3, Joyaa Antares <joyaa@...> wrote:


Hi Folks,
I have two medium-sized documents detailing my family ancestry.  Both documents are in PDF format, and both are in Hebrew.
I want to be able to copy the text into a good Hebrew-English translation tool in order to be able to understand and verify what has been written.  Unfortunately, these PDFs don't allow text to be "selected" in order to copy/paste but perhaps it could be done with a suitable OCR or Optical Character Recognition tool that would render the text "copyable"?  Does anyone know?  Does anyone have access to such a tool who could help me with this?
Thank you very much.
Joyaa ANTARES
Gold Coast, Queensland, Australia
___________________________
Researching   SCHORR, SCHERZER, JURIS and DAWID in Buckaczowce, Ottynia, Nadworna, and Kolomyya


Joyaa Antares
 

Thank you all very much for your input and some really wonderful ideas!  I'll give a status report here for those interested in the topic now and for the record.
Unfortunately, the original PDFs - whilst legible and intelligible to someone fluent in Hebrew - are simply not readable by adobe acrobat or using any of the solutions provided to date.   However, I am reasonably sure that Dahn Cukier has given the correct reason for this - that the original document may have been created as images and then saved as a pdf.  (Certainly the suggestions from Gary Binetter and Meir Razy, whilst offering hope, didn't work in this instance.  Also, I have tried copying text from the document using a paid / full version of Adobe Acrobat without success [thank you Peter Straus]).
Therefore, I am running with Avraham Kahana's suggestion of trialling https://convertio.co/ on one of my five files.  The program has converted the pdf into a MS Word document (which was my choice of document type from the list offered by the program) that looks like utter garbage, containing Chinese characters, numbers, and all kinds of glyphs.  Still, this is much more promising than the blank content that resulted from other attempts at file conversion.  I plan to send this "garbage" file to Dahn Zukrowicz to see what can be made of it.  If this fails, I'll follow up a suggestion from David Lewin to approach the National Library of Israel to see if they have the documents and in a better format (I think it's unlikely so am trying Dahn's method first). 
I will report back here.
Joyaa ANTARES
Gold Coast, Qld, Australia


Alicia Weiss
 

Perhaps I am misunderstanding the problem, but "Image" .pdf files certainly can be converted to text; I do this all the time to produce accessible files for people who use assistive technologies, although I confess I have never attempted Hebrew or other language with nonLatin characters. . Sometimes a bit of editing is necessary as OCR is not always 100% accurate, but it will not produce a blank file if you are doing this correctly Of course, you will then need to run the resultant text through Google Translate or a similar program.  I am assuming that when you attempted this using the paid version of Adobe Acrobat, you first set the language to Hebrew.  What was the output?  There are also higher-quality programs that are designed specifically for OCR purpose that often perform better than Adobe (for which OCR is one of their tools), such as Abbyfine and OmniPage.  Of the two, Abbyy Fine is the more affordable.



Alicia Weiss
Researching: WEISS/WEISZ Szecseny, Hungary; KUNDLER, Kaposmero/Kisvarda/Gyongyos/Budapest, Hungary; POLLAK, Kaposmero/Csurgo, Hungary;PRESSMAN/PRESEISEN, Kiev, Ukraine; GOLDFELD, Russia; Dufine/Dufan/Dufayn, Orgeev Moldova/ Tuchin Ukraine; MAUTHNER/MAUTNER, Szecseny, Hungary, HEMPEL/HAMPEL, Poland.
 
 
 
 
 
 
 
 
 


H Duboff
 

Hello.

Have you tried uploading it to Google Translate? 
https://translate.google.com/

Toward the top there is a choice of text or document.  You may be able to upload the PDF.

Regards,
Henoch DUBOFF
Mequon, Wisconsin
USA


David Lewin
 

For a restricted number of lsnguges www.deepl.com is far superior to the  google tranlation
David Lewin


At 15:41 08/04/2021, H Duboff wrote:
Hello.

Have you tried uploading it to Google Translate? 
https://translate.google.com/

Toward the top there is a choice of text or document.  You may be able to upload the PDF.

Regards,
Henoch DUBOFF
Mequon, Wisconsin
USA


Joyaa Antares
 

On Thu, Apr 8, 2021 at 09:35 AM, Alicia Weiss wrote:
I am assuming that when you attempted this using the paid version of Adobe Acrobat, you first set the language to Hebrew.

Second and likely final update:
Alicia Weiss - many thanks very much for your suggestions.  I am not certain that I set the language to Hebrew in Adobe Acrobat full version, but I think I did.   Re. ABBYY: I was also contacted offlist by Noach, who very kindly trialled testing one of the documents using this tool and it worked.  :-)
I have since downloaded ABBYY "convert 100 pages for free" trial version successfully to convert over 75% of the material I want converted into readable, copyable text.  I have also discovered that after copying the text into a Word document, MS Word has a good translation tool too.  
Henoch Duboff - thanks for pointing out the 'document' option in Google Translate.  Unfortunately, it didn't result in anything other than a blank page once again.
My thanks to all for your great suggestions.
Joyaa Antares, 

Gold Coast, QLD, Australia


Paul Silverstone
 

This discussion has been of great interest to me.  When I was in Israel a few years ago, I copied with my Iphone over 150 pages of documents at various archives.
They are mostly in Hebrew, typewritten.   To my dismay I found that it was very difficult to convert the JPG to text that could be put into a translating service
such as Google translate.   The closest I got was garbage.   I will try some of the suggestions offered.  Thank you for the discussion.
Paul Silverstone