The Allen Institute for Artificial Intelligence (Ai2) has released an open-source OCR toolkit, olmOCR, that converts PDFs and scanned images into structure-preserving Markdown, and its latest version, olmOCR 2, scored 82.4 on the project's own benchmark after reinforcement-learning tuning. The aim is to make the vast knowledge locked in PDFs—papers, contracts, financial reports, historical archives—readable by LLMs and RAG systems.
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.