Vision Text Extractor

Vision Text Extractor - AI-powered OCR with multiple provider support

Features

The Vision Text Extractor is a powerful CLI tool that combines multiple AI providers for optimal text extraction from images and documents. Whether you need privacy-first local processing or high-accuracy cloud-based solutions, this tool has you covered.

Key Capabilities:

  • Multiple AI Providers: Choose from local SmolVLM/LLaVA models or cloud-based OpenAI GPT-4o
  • Privacy-First Design: Local processing keeps your sensitive data on your machine
  • Flexible Input Options: Process local files or images from web URLs
  • Custom Prompts: Extract specific information with tailored extraction prompts
  • Easy Setup: One-command installation using Pixi package manager

Quick Start

Getting started with Vision Text Extractor is simple:

# Clone and install
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor
pixi install

# Setup
pixi run setup-smolvlm Β  Β # Hugging Face SmolVLM (~2GB)
pixi run setup-ollama Β  Β  # Ollama LLaVA (~4GB)

# Quick demo
pixi run demo-ocr-huggingface

# Use with your images refer this for quick Commands: 
# https://github.com/udit-asopa/vision-text-extractor/tree/main?tab=readme-ov-file#-quick-commands
pixi run ocr_llm path/to/your/image.jpg

Udit Asopa
Udit Asopa
Remote Sensing Data Specialist (official title: SAR Remote Sensing Engineer)

I specialize in SAR and GIS-based Earth observation workflows for environmental monitoring, disaster response, and scientific analysis. With a focus on automation, reproducibility, and applied geospatial intelligence, I contribute to building scalable, data-driven solutions.

Previous

Related