next-ocr / README.md

Update README.md

96c13eb verified 29 days ago

7.61 kB

	---
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3_vl
	- trl
	- sft
	- chemistry
	- code
	- climate
	- art
	- biology
	- finance
	- legal
	- music
	- medical
	- agent
	license: apache-2.0
	language:
	- en
	- ab
	- aa
	- ae
	- af
	- ak
	- am
	- an
	- ar
	- as
	- av
	- ay
	- az
	- ba
	- be
	- bg
	- bh
	- bi
	- bm
	- bn
	- bo
	- br
	- bs
	- ca
	- ce
	- ch
	- co
	- cr
	- cs
	- cu
	- cv
	- cy
	- da
	- de
	- dv
	- dz
	- ee
	- el
	- eo
	- es
	- et
	- eu
	- fa
	- ff
	- fi
	- fj
	- fo
	- fr
	- fy
	- ga
	- gd
	- gl
	- gn
	- gv
	- ha
	- he
	- hi
	- ho
	- gu
	- hr
	- ht
	- hu
	- hz
	- hy
	- id
	- ia
	- ig
	- ie
	- ik
	- ii
	- is
	- io
	- iu
	- it
	- jv
	- ja
	- kg
	- ka
	- kj
	- ki
	- kl
	- kk
	- kn
	- km
	- kr
	- ko
	- ku
	- ks
	- kw
	- kv
	- la
	- ky
	- lg
	- lb
	- ln
	- li
	- lt
	- lo
	- lv
	- lu
	- mg
	- mi
	- mh
	- ml
	- mk
	- mr
	- mn
	- mt
	- ms
	- na
	- my
	- nd
	- nb
	- ng
	- nl
	- ne
	- 'no'
	- nn
	- nv
	- nr
	- oc
	- oj
	- om
	- ny
	- os
	- or
	- pa
	- pi
	- pl
	- ps
	- pt
	- rm
	- rn
	- qu
	- ro
	- ru
	- sn
	- rw
	- so
	- sa
	- sc
	- sd
	pipeline_tag: image-text-to-text
	library_name: transformers
	---
	<img src='bannerocr.png'>

	# 🖼️ Next OCR 8B

	### Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized

	[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
	[![Language: Multilingual](https://img.shields.io/badge/Language-Multilingual-red.svg)]()
	[![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--OCR--orange.svg)](https://huggingface.co/Lamapi/next-ocr)

	---

	## 📖 Overview

	Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.

	Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.

	---

	## ⚡ Highlights

	* 🖼️ Accurate text extraction, including math and tables
	* 🌍 Multilingual support (30+ languages)
	* ⚡ Lightweight and efficient
	* 💬 Instruction-tuned for document understanding and analysis

	---

	## 📊 Benchmark & Comparison

	![image](https://cdn-uploads.huggingface.co/production/uploads/67d46bc5fe6ad6f6511d6f44/wLtEbJ9U3KCJe4OCxvAF7.png)

	---

	\| Model \| OCR-Bench Accuracy (%) \| Multilingual Accuracy (%) \| Layout / Table Understanding (%) \|
	\| ------------------------------- \| ------------------------ \| ------------------------- \| -------------------------------- \|
	\| Next OCR \| 99.0 \| 96.8 \| 95.3 \|
	\| PaddleOCR \| 95.2 \| 93.9 \| 95.3 \|
	\| Deepseek OCR \| 90.6 \| 87.4 \| 86.1 \|
	\| Tesseract \| 92.0 \| 88.4 \| 72.0 \|
	\| EasyOCR \| 90.4 \| 84.7 \| 78.9 \|
	\| Google Cloud Vision / DocAI \| 98.7 \| 95.5 \| 93.6 \|
	\| Amazon Textract \| 94.7 \| 86.2 \| 86.1 \|
	\| Azure Document Intelligence \| 95.1 \| 93.6 \| 91.4 \|

	---

	\| Model \| Handwriting (%) \| Scene Text (%) \| Complex Tables (%) \|
	\| --------------------------- \| --------------- \| -------------- \| ------------------ \|
	\| Next OCR \| 92 \| 96 \| 91 \|
	\| PaddleOCR \| 88 \| 92 \| 90 \|
	\| Deepseek OCR \| 80 \| 85 \| 83 \|
	\| Tesseract \| 75 \| 88 \| 70 \|
	\| EasyOCR \| 78 \| 86 \| 75 \|
	\| Google Cloud Vision / DocAI \| 90 \| 95 \| 92 \|
	\| Amazon Textract \| 85 \| 90 \| 88 \|
	\| Azure Document Intelligence \| 87 \| 91 \| 89 \|

	---

	## 🚀 Installation & Usage

	```python
	from transformers import AutoTokenizer, AutoModelForVision2Seq
	import torch

	model_id = "Lamapi/next-ocr"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16)

	img = Image.open("image.jpg")

	# ATTENTION: The content list must include both an image and text.
	messages = [
	{"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."},
	{
	"role": "user",
	"content": [
	{"type": "image", "image": img},
	{"type": "text", "text": "Read the text in this image and summarize it."}
	]
	}
	]

	# Apply the chat template correctly
	prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

	with torch.no_grad():
	generated = model.generate(**inputs, max_new_tokens=256)

	print(processor.decode(generated[0], skip_special_tokens=True))
	```

	---

	## 🧩 Key Features

	\| Feature \| Description \|
	\| -------------------------- \| --------------------------------------------------------------- \|
	\| 🖼️ High-Accuracy OCR \| Extracts text from images, documents, and screenshots reliably. \|
	\| 🇹🇷 Multilingual Support \| Works with 30+ languages including Turkish. \|
	\| ⚡ Lightweight & Efficient \| Optimized for resource-constrained environments. \|
	\| 📄 Layout & Math Awareness \| Handles tables, forms, and mathematical formulas. \|
	\| 🏢 Reliable Outputs \| Suitable for enterprise document workflows. \|

	---

	## 📐 Model Specifications

	\| Specification \| Details \|
	\| ----------------- \| --------------------------------------------------------- \|
	\| Base Model \| Qwen 3 \|
	\| Parameters \| 8 Billion \|
	\| Architecture \| Vision + Transformer (OCR LLM) \|
	\| Modalities \| Image-to-text \|
	\| Fine-Tuning \| OCR datasets with multilingual and math/tabular content \|
	\| Optimizations \| Quantization-ready, FP16 support \|
	\| Primary Focus \| Text extraction, document understanding, mathematical OCR \|

	---

	## 🎯 Ideal Use Cases

	* Document digitization
	* Invoice & receipt processing
	* Multilingual OCR pipelines
	* Tables, forms, and formulas extraction
	* Enterprise document management

	---

	## 📄 License

	MIT License — free for commercial & non-commercial use.

	---

	## 📞 Contact & Support

	* 📧 Email: [[email protected]](mailto:[email protected])
	* 🤗 HuggingFace: [Lamapi](https://huggingface.co/Lamapi)

	---

	> Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.

	[![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)