PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux
No Result
View All Result
PikoPong
No Result
View All Result
Home Linux

Use gImageReader to Extract Text From Images and PDFs on Linux

March 8, 2021
in Linux
283 9
Use gImageReader to Extract Text From Images and PDFs on Linux


Brief: gImageReader is a GUI tool to utilize tesseract OCR engine for extracting texts from images and PDF files in Linux.

gImageReader is a front-end for Tesseract Open Source OCR Engine. Tesseract was originally developed at HP and then was open-sourced in 2006.

Basically, the OCR (Optical Character Recognition) engine lets you scan texts from a picture or a file (PDF). It can detect several languages by default and also supports scanning through Unicode characters.

However, the Tesseract by itself is a command-line tool without any GUI. So, here, gImageReader comes to the rescue to let any user utilize it to extract text from images and files.

Let me highlight a few things about it while mentioning my experience with it for the time I tested it out.

gImageReader: A Cross-Platform Front-End to Tesseract OCR

To simplify things, gImageReader comes in handy to extract text from a PDF file or an image that contains any kind of text.

Whether you need it for spellcheck or translation, it should be useful for a specific group of users.

To sum up the features in a list, here’s what you can do with it:

Add PDF documents and images from disk, scanning devices, clipboard and screenshotsAbility to rotate imagesCommon image controls to adjust brightness, contrast, and resolutionScan images directly through the appAbility to process multiple images or files in one goManual or automatic recognition area definitionRecognize to plain text or to hOCR documentsEditor to display the recognized textCan spellcheck the text extractedConvert/Export to PDF documents from hOCR documentExport extracted text as a .txt fileCross-platform (Windows)

Installing gImageReader on Linux

Note: You need to explicitly install Tesseract language packs to detect from images/files from your software manager.

You can find gImageReader in the default repositories for some Linux distributions like Fedora and Debian.

For Ubuntu, you need to add a PPA and then install it. To do that, here’s what you need to type in the terminal:

sudo add-apt-repository ppa:sandromani/gimagereader
sudo apt update
sudo apt install gimagereader

You can also find it for openSUSE from its build service and AUR will be the place for Arch Linux users.

All the links to the repositories and the packages can be found in their GitHub page.

Experience with gImageReader

gImageReader is a quite useful tool for extracting texts from images when you need them. It works great when you try from a PDF file.

For extracting images from a picture shot on a smartphone, the detection was close but a bit inaccurate. Maybe when you scan something, recognition of characters from the file could be better.

So, you’ll have to try it for yourself to see how well it works for your use-case. I tried it on Linux Mint 20.1 (based on Ubuntu 20.04).

I just had an issue to manage languages from the settings and I didn’t get a quick solution for that. If you encounter the issue, you might want to troubleshoot it and explore more about it how to fix it.

Other than that, it worked just fine.

Do give it a try and let me know how it worked for you! If you know of something similar (and better), do let me know about it in the comments below.

Like what you read? Please share it with others.



Source link

Share219Tweet137Share55Pin49

Related Posts

How to Delete Partitions in Linux [Using fdisk and GParted]
Linux

How to Delete Partitions in Linux [Using fdisk and GParted]

Managing partitions is serious business, especially when you have to remove them. I find myself doing this frequently, especially...

April 20, 2021
Blanket: Ambient Noise App With Variety of Sounds to Stay Focused
Linux

Blanket: Ambient Noise App With Variety of Sounds to Stay Focused

Brief: An open-source ambient noise player offering a variety of sounds to help you focus or fall asleep.With the...

April 19, 2021
How to Deploy Seafile Server with Docker
Linux

How to Deploy Seafile Server with Docker

First off, what is Seafile? Seafile is a self-hosted file synchronization program that works with the server-client model, as...

April 18, 2021
Systemd-Free Arch With Linux-libre Kernel
Linux

Systemd-Free Arch With Linux-libre Kernel

In the last month of 2019, the Hyperbola project took a major decision of ditching Linux in favor of...

April 17, 2021
Next Post
Web Components Are Easier Than You Think

Web Components Are Easier Than You Think

How to Update openSUSE Linux System

How to Update openSUSE Linux System

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Managing AWS ParallelCluster SSH users with OpenLDAP : idk.dev

Managing AWS ParallelCluster SSH users with OpenLDAP : idk.dev

August 21, 2020
What Happened to IPv5? Why there is IPv4, IPv6 but no IPv5?

What Happened to IPv5? Why there is IPv4, IPv6 but no IPv5?

May 24, 2020
Announcing the General Availability of Bottlerocket, an open source Linux distribution built to run containers : idk.dev

Announcing the General Availability of Bottlerocket, an open source Linux distribution built to run containers : idk.dev

August 31, 2020
Create an FAQ Slack app with Netlify functions and FaunaDB

Create an FAQ Slack app with Netlify functions and FaunaDB

October 22, 2020

Categories

  • AWS
  • Big Data
  • Database
  • DevOps
  • IoT
  • Linux
  • Web Dev
No Result
View All Result
  • Web Dev
  • Hack
  • Database
  • Big Data
  • AWS
  • Linux

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In