olmOCR – Open-Source OCR for Accurate Document Conversion
olmOCR is an open-source optical character recognition (OCR) tool that enables high-throughput conversion of PDFs and other documents into plain text while maintaining the natural reading order, accommodating various content types like tables, equations, and handwriting.