Program: gImageReader and Tesseract
License: Open Source
Description: gImageReader is a GUI front end for the Tesseract OCR engine
Website: http://sourceforge.net/projects/gimagereader/ and http://sourceforge.net/projects/tesseract-ocr/
gImageReader is an excellent front end for the Tesseract OCR engine.
Tesseract is an open source OCR engine that converts images into editable text. It is installed onto a system that has Tesseract already installed, which is why this App Request lists both of them.
gImageReader Features
- Open images and PDFs
- Acquire from scanner
- Select the part of the image to recognize
- Support for different recognition languages
- Side by side comparison of source image and output text
- Remove linebreaks in output text
- Supports tesseract 3.0
One challenge is that while it also supports spellcheck, it uses the dictionary from OpenOffice. Possibly could be configured to use the dictionaries in LibreOffice Portable?
I would be interested in this, as well...
Bill G.
Frozen St. Paul, MN
land of the frozen mosquito
Nice find! I'll consider packaging it when school is finally over for me.
This would do wonders for me, especially since TopOCR Portable no longer exists
SWAG
Okay, I've created test installers and haven't made posts in the forum yet.
https://sourceforge.net/projects/voltronportable/files/gImageReaderPorta...
https://sourceforge.net/projects/voltronportable/files/Tesseract-ocrPort...
Just add the
X:\PortableApps\CommonFiles\Tesseract-ocr
to the gImageReader configuration.I've created two two posts in the beta forum for these applications.
Tesseract-ocr Portable
gImageReader Portable
Thanks for taking the initiative on this, but I was wondering what the advantage it in having them as separate packages.
Tesseract is a command-line only app - which is the reason why gImageReader is necessary. Since they are interdependent (gImageReader won't work without Tesseract and Tesseract is non-GUI without gImageReader) wouldn't it make sense to package them together, rather than to require users to download each separately?
I made this half-pony, half-monkey monster to please you.
In my opinion, Tesseract-ocr acts as a plugin and should be installed in the CommonFiles directory. This allows Tesseract-ocr to have once instance installed if another application needs to use it.
Realistically, as gImageReader is the only thing using it and it needs it to be of any use, it should be bundled and be one package. CommonFiles is only for broad things used by many apps, think Java or GhostScript. Sometimes we bundle something that would really fit in plugins because it's the only app that needs it. Sometimes because it needs a specific version (like GTK). Here, though, this should be one package with both the main app and Tesseract. If we get to the point where we have a few apps using it, we can revisit it.
Sometimes, the impossible can become possible, if you're awesome!
John, thanks for the clarification. I can bundle the two packages together and post them to the gImageReader post. What directory structure do you recommend using?
gImageReaderPortable\App\Tesseract-ocr
or
gImageReaderPortable\App\gimagereader\Tesseract-ocr
?Thanks.
I'd leave it to you. I think gImageReader\App\Tesseract-ocr may be the better fit, since they are separate pieces. And if we do it as a separate plugin later, we can keep that and not have it bundled (ala the App\Java directory in LibreOfficePortable).
Sometimes, the impossible can become possible, if you're awesome!
I've updated gImageReader Portable to include Tesseract-ocr and Tesseract-ocr Portable has been outdated.