Tomorrow (Feb 4) at #CfgMgmtCamp in Ghent, Ming and I will run a workshop on #Docling at 14:00 in B.1.031 - https://cfp.cfgmgmtcamp.org/ghent2026/talk/9CV7CY/
Join us! @cfgmgmtcamp
Tomorrow (Feb 4) at #CfgMgmtCamp in Ghent, Ming and I will run a workshop on #Docling at 14:00 in B.1.031 - https://cfp.cfgmgmtcamp.org/ghent2026/talk/9CV7CY/
Join us! @cfgmgmtcamp
The slides and recording for my presentation on Get your docs in a row with #Docling are now available - https://fosdem.org/2026/schedule/event/DVRV8S-get_your_docs_in_a_row_with_docling/
Thanks to @fosdem organizers and volunteers for another amazing event. My 11th in-person #FOSDEM (13 including virtual ones).
The docling-testcontainers module provides a ready-to-use Testcontainers integration for running a Docling Serve instance, wrapping the official container image and exposing a simple Java API.
https://testcontainers.com/modules/docling/
Как затащить AI в Java/Kotlin проект
Мир Enterprise-разработки на Java/Kotlin и мир нейронных сетей кажутся параллельными вселенными. С одной стороны - статическая типизация, многопоточность, Spring-контейнеры, а с другой - Python-скрипты, тензорные операции и эксперименты в Jupyter Notebook. Между ними - пропасть, через которую многие команды не решаются перешагнуть. Однако необходимость строить этот мост возникает всё чаще. Заказчик хочет «искусственный интеллект» в новом фиче, аналитики мечтают о реализации чат-бота с преферансом и барышнями, а менеджеры слышали, что конкуренты уже всё автоматизировали. Как же совместить надежность и структуру JVM-проекта с гибкостью и мощью AI? В этой статье постараемся разобраться какие инструменты для этого есть на данный момент и как с ними работать.
https://habr.com/ru/articles/984544/
#AI #ИИ #Java #Kotlin #LLM #State_Graph #Vector_DB #Docling #Embeddings
Just published a new deep-dive on building enterprise-grade RAG in Java.
In this tutorial, we combine:
• Quarkus
• Docling (layout-aware PDF parsing)
• pgvector + PostgreSQL
• Local LLMs via Ollama
• And a simple guardrail layer
This is the most complete RAG pipeline I’ve built so far, and it’s fully open for you to copy, run, and adapt.
Read here:
https://www.the-main-thread.com/p/enterprise-rag-quarkus-docling-pgvector-tutorial
In this new article, I describe how to build a Retrieval Augmented Generation system in Java using Spring AI and Docling for advanced, privacy-focused document processing. You'll learn how to design an Ingestion Pipeline powered by Docling for loading, converting, and chunking any type of document for your RAG use cases.
Docling is an open-source, privacy-focused solution for advanced document parsing. Using the brand-new Docling Java SDK and Arconia, I'll show you how to integrate Docling into your Spring Boot applications, and prepare documents for RAG and GenAI.
#Java #SpringBoot #AI #Docling
https://www.thomasvitale.com/ai-document-processing-docling-java-arconia-spring-boot/
Docling #Java is the official Java client and tooling for #Docling — a suite that simplifies document processing and parsing across diverse formats (with advanced PDF understanding) and integrates seamlessly with #GenAI frameworks.
https://docling-project.github.io/docling-java/dev/
We've seen using #docling a lot at work lately to parse all kinds of documents in various formats. It's handy for converting them into a common JSON document.
Updating stickers on laptops... let's see how many I can tag
@matrix @ansible @InstructLab @thinkpadmuseum @trustyai @github @fedora
#ramalama #docling #vllm #llmd #pytorch #NERC #redhat #ospo #upstream #ansible #thinkpad #womeninfedora #expo2025 #kubeflow #trustyAI #cushingcenter #operationstickybusiness
Wow, #docling added support for Arabic and can handle complex documents with text that goes right to left!
The docling document pipeline:
Context is important! Vegetative electron microscopy does not exist! 😂
Another reason to use Docling for your documents. RAG is another garbage in, garbage out situation.
Arconia is a framework I've been building on top of Java and Spring Boot, focusing on developer experience and cloud native. In the latest release, I shipped a new module to integrate with Docling, an open-source, AI-powered, advanced document processor. JBang makes it really convenient to run!
Code: https://gist.github.com/ThomasVitale/06932a1e872935de7d4587a4519eb76f
Docs: https://arconia.io/docs/arconia/latest/integrations/docling/
ICYMI: Taming Unstructured Data: From PDFs to JSON with Quarkus and Docling https://open.substack.com/pub/myfear/p/quarkus-docling-data-preparation-for-ai?r=17bggb&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
#Java #quarkus #Docling #Data
Taming Unstructured Data: From PDFs to JSON with Quarkus and Docling
Build a fast, scalable converter to turn business documents into structured data
https://myfear.substack.com/p/quarkus-docling-data-preparation-for-ai
#Java #Quarkus #Docling #AIML #PDF #DocumentParsing
A pauta de hoje do #TerSoftware é sobre "gestão de papel". Recentemente, testei OCR para digitalização de tabelas e... não fiquei muito feliz com o resultado.
Acredito que #OCR funcione melhor quando fica bem amarrado com o documento digitalizado (por exemplo, tornando um arquivo PDF buscável), mas para extração de texto, ainda é um grande "depende".
Na minha curta jornada, testei #Tesseract e #Docling. Talvez funcione com código bem escrito, mas acabei me rendendo e indo "no muque" mesmo.
O Tesseract parece bem fácil de instalar no Linux (mesmo no #openSUSE Leap, que tem suas limitações por sair do SUSE empresarial, achei fácil), mas o Docling exigiu alguns malabarismos com ambientes em Python (usando conda e pip).
Para texto corrido, o Tesseract parece bem suficiente, já. Pode ser rodado via linha de comando e, pelo menos no openSUSE Leap, vários dicionários se encontram empacotados para facilitar.