A Flask-powered web application that extracts text from images using Tesseract OCR. The app supports image preprocessing for improved accuracy, an intuitive interface, and the ability to download extracted text.
- OCR-Based Text Extraction: Leverages Tesseract OCR for reliable text recognition.
- Preprocessing for Accuracy: Automatically preprocesses images (grayscale and thresholding).
- Image Preview: See your uploaded image before processing.
- Text Export: Download extracted text as a
.txtfile with a single click. - Responsive UI: A mobile-friendly, modern interface.
You can view a live demo of the app here
- Backend: Flask, Python
- Frontend: HTML, CSS, JavaScript
- Image Processing: OpenCV, Pillow (PIL)
- OCR: Tesseract OCR
- Python 3.7+
- Tesseract OCR:
- Windows: Download Tesseract.
- Linux/Mac:
sudo apt install tesseract-ocr
-
Clone the repository:
git clone https://github.com/HariPasapuleti/Text-Extractor.git cd Text-Extractor -
Install dependencies:
pip install -r requirements.txt
-
Configure Tesseract path:
- Windows:
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
- Linux/Mac: No changes needed (default path).
- Windows:
-
Start the Flask server:
python text_extractor.py
- Upload an Image: Accepts
.jpgor.pngformats. - Preview: Check the image preview before processing.
- Extract Text: View extracted text directly on the page.
- Download: Save the extracted text as a
.txtfile.