<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Document Intelligence | Udit Asopa</title><link>https://uditasopa.netlify.app/tag/document-intelligence/</link><atom:link href="https://uditasopa.netlify.app/tag/document-intelligence/index.xml" rel="self" type="application/rss+xml"/><description>Document Intelligence</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>2026</copyright><lastBuildDate>Fri, 17 Oct 2025 00:00:00 +0000</lastBuildDate><image><url>https://uditasopa.netlify.app/media/avatar.png</url><title>Document Intelligence</title><link>https://uditasopa.netlify.app/tag/document-intelligence/</link></image><item><title>Vision Text Extractor</title><link>https://uditasopa.netlify.app/project/ocr_detectors_with_vlms/</link><pubDate>Fri, 17 Oct 2025 00:00:00 +0000</pubDate><guid>https://uditasopa.netlify.app/project/ocr_detectors_with_vlms/</guid><description>&lt;h2 id="features">Features&lt;/h2>
&lt;p>The Vision Text Extractor is a powerful CLI tool that combines multiple AI providers for optimal text extraction from images and documents. Whether you need privacy-first local processing or high-accuracy cloud-based solutions, this tool has you covered.&lt;/p>
&lt;h3 id="key-capabilities">Key Capabilities:&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Multiple AI Providers&lt;/strong>: Choose from local SmolVLM/LLaVA models or cloud-based OpenAI GPT-4o&lt;/li>
&lt;li>&lt;strong>Privacy-First Design&lt;/strong>: Local processing keeps your sensitive data on your machine&lt;/li>
&lt;li>&lt;strong>Flexible Input Options&lt;/strong>: Process local files or images from web URLs&lt;/li>
&lt;li>&lt;strong>Custom Prompts&lt;/strong>: Extract specific information with tailored extraction prompts&lt;/li>
&lt;li>&lt;strong>Easy Setup&lt;/strong>: One-command installation using Pixi package manager&lt;/li>
&lt;/ul>
&lt;h2 id="quick-start">Quick Start&lt;/h2>
&lt;p>Getting started with Vision Text Extractor is simple:&lt;/p>
&lt;pre>&lt;code class="language-bash"># Clone and install
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor
pixi install
# Setup
pixi run setup-smolvlm    # Hugging Face SmolVLM (~2GB)
pixi run setup-ollama     # Ollama LLaVA (~4GB)
# Quick demo
pixi run demo-ocr-huggingface
# Use with your images refer this for quick Commands:
# https://github.com/udit-asopa/vision-text-extractor/tree/main?tab=readme-ov-file#-quick-commands
pixi run ocr_llm path/to/your/image.jpg
&lt;/code>&lt;/pre></description></item></channel></rss>