Skip to content

dartero-projects/ocr

 
 

Repository files navigation

OCR

OCR

Build Status Scrutinizer Code Quality Code Coverage License: AGPL v3

Nextcloud OCR (optical character recognition) processing for images and PDF with tesseract-ocr and OCRmyPDF brings OCR capability to your Nextcloud. The app uses a docker container with tesseract-ocr, OCRmyPDF and communicates over redis in order to process images (png, jpeg, tiff) and PDF asynchronously and save the output file to the source folder in nextcloud. That for example enables you to search in it. (Hint: currently not all PDF-types are supported, for more information see here)

Prerequisites, Requirements and Dependencies

The OCR app has some prerequisites:

  • Nextcloud 12 or 13. For older versions take an older major version of this app.
  • Linux server as environment. (tested with Debian 8 and Ubuntu 14.04 (Trusty)) currently not compatible to ARM processors like raspberry
  • Docker is used for processing files. tesseract-ocr and OCRmyPDF reside in a docker container.
  • php-redis is used for the communication and has to be a part of your php.

Limitations

Currently the app is not working with any activated encryption, nor is it working with files shared via external storage or federated sharing. This has to be considered. If one wants to process such a file, it must be copied to the local environment.

For further information see the homepage or the appropriate documentation in the wiki.

Installation

Install the app from the Nextcloud AppStore or download the release package from github (NOT the sources) and place the content in nextcloud/apps/ocr/.

Please consider: The app will not work as long as the Docker container isn't running. (more information in the wiki)

Administration and Usage

Please read the related topics in the wiki.

Disclaimer

The software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

About

Nextcloud OCR (optical character recoginition) processing for images and PDF with tesseract-ocr, OCRmyPDF and message queueing for asynchronous purpose.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • PHP 38.9%
  • JavaScript 36.2%
  • TypeScript 23.9%
  • Other 1.0%