#Docling

2026-02-03

Tomorrow (Feb 4) at #CfgMgmtCamp in Ghent, Ming and I will run a workshop on #Docling at 14:00 in B.1.031 - cfp.cfgmgmtcamp.org/ghent2026/

Join us! @cfgmgmtcamp

2026-02-01

The slides and recording for my presentation on Get your docs in a row with #Docling are now available - fosdem.org/2026/schedule/event

Thanks to @fosdem organizers and volunteers for another amazing event. My 11th in-person #FOSDEM (13 including virtual ones).

2026-01-14

The docling-testcontainers module provides a ready-to-use Testcontainers integration for running a Docling Serve instance, wrapping the official container image and exposing a simple Java API.
testcontainers.com/modules/doc

#Docling #Java #Testcontainers

2026-01-12

Как затащить AI в Java/Kotlin проект

Мир Enterprise-разработки на Java/Kotlin и мир нейронных сетей кажутся параллельными вселенными. С одной стороны - статическая типизация, многопоточность, Spring-контейнеры, а с другой - Python-скрипты, тензорные операции и эксперименты в Jupyter Notebook. Между ними - пропасть, через которую многие команды не решаются перешагнуть. Однако необходимость строить этот мост возникает всё чаще. Заказчик хочет «искусственный интеллект» в новом фиче, аналитики мечтают о реализации чат-бота с преферансом и барышнями, а менеджеры слышали, что конкуренты уже всё автоматизировали. Как же совместить надежность и структуру JVM-проекта с гибкостью и мощью AI? В этой статье постараемся разобраться какие инструменты для этого есть на данный момент и как с ними работать.

habr.com/ru/articles/984544/

#AI #ИИ #Java #Kotlin #LLM #State_Graph #Vector_DB #Docling #Embeddings

2026-01-02

Just published a new deep-dive on building enterprise-grade RAG in Java.

In this tutorial, we combine:

• Quarkus
• Docling (layout-aware PDF parsing)
• pgvector + PostgreSQL
• Local LLMs via Ollama
• And a simple guardrail layer

This is the most complete RAG pipeline I’ve built so far, and it’s fully open for you to copy, run, and adapt.

Read here:
the-main-thread.com/p/enterpri

#Java #Quarkus #LLM #RAG #Docling #OpenSource

Thomas Vitale ☀️thomasvitale@mastodon.online
2025-12-15

In this new article, I describe how to build a Retrieval Augmented Generation system in Java using Spring AI and Docling for advanced, privacy-focused document processing. You'll learn how to design an Ingestion Pipeline powered by Docling for loading, converting, and chunking any type of document for your RAG use cases.

#SpringAI #Docling #Arconia #Java

thomasvitale.com/rag-docling-j

Thomas Vitale ☀️thomasvitale@mastodon.online
2025-11-24

Docling is an open-source, privacy-focused solution for advanced document parsing. Using the brand-new Docling Java SDK and Arconia, I'll show you how to integrate Docling into your Spring Boot applications, and prepare documents for RAG and GenAI.

#Java #SpringBoot #AI #Docling

thomasvitale.com/ai-document-p

2025-11-20

Docling #Java is the official Java client and tooling for #Docling — a suite that simplifies document processing and parsing across diverse formats (with advanced PDF understanding) and integrates seamlessly with #GenAI frameworks.
docling-project.github.io/docl

2025-11-01

@Semtex Noch nicht, kann ich aber mal in ein Gist schmeißen. Ist aktuell "nur" ein Pythonskript, will aber auch noch ein n8n-Workflow dazu bauen. Vorher will ich aber noch #Marker als Alternative zu #Docling probieren.

2025-10-31

Habe mir heute im Rahmen meiner #pke25 Lernaktivitäten mal eine #Docling Pipeline aufgesetzt, um meine ganzen PDF-Sammlungen für die Nutzung mit KI vorzubereiten (Bücher, Artikel, Studien, Präsentationen etc.). Eine Seite braucht knapp eine Sekunde.

Major Hayden 🤠major@tootloop.com
2025-09-26

We've seen using #docling a lot at work lately to parse all kinds of documents in various formats. It's handy for converting them into a common JSON document.

major.io/p/fun-with-docling/

#rag #ai #knowledge #documents

Major Hayden 🤠major@tootloop.com
2025-09-20

Wow, #docling added support for Arabic and can handle complex documents with text that goes right to left!

#devconf_us #devconfus #ai #rag

Major Hayden 🤠major@tootloop.com
2025-09-20

The docling document pipeline:

#devconf_us #devconfus #docling

Major Hayden 🤠major@tootloop.com
2025-09-20

Context is important! Vegetative electron microscopy does not exist! 😂

Another reason to use Docling for your documents. RAG is another garbage in, garbage out situation.

#devconf_us #devconf #ai #docling

Alexandre B A Villares 🐍villares@ciberlandia.pt
2025-09-20

@major Tell us more about #Docling!

Thomas Vitale ☀️thomasvitale@mastodon.online
2025-08-07

Arconia is a framework I've been building on top of Java and Spring Boot, focusing on developer experience and cloud native. In the latest release, I shipped a new module to integrate with Docling, an open-source, AI-powered, advanced document processor. JBang makes it really convenient to run!

Code: gist.github.com/ThomasVitale/0

Docs: arconia.io/docs/arconia/latest

Docling: docling-project.github.io/docl

#Java #AI #Docling

Arconia Docling: Document Processing + AI See code: https://gist.github.com/ThomasVitale/06932a1e872935de7d4587a4519eb76f
2025-07-12

Taming Unstructured Data: From PDFs to JSON with Quarkus and Docling
Build a fast, scalable converter to turn business documents into structured data
myfear.substack.com/p/quarkus-
#Java #Quarkus #Docling #AIML #PDF #DocumentParsing

2025-07-08

A pauta de hoje do #TerSoftware é sobre "gestão de papel". Recentemente, testei OCR para digitalização de tabelas e... não fiquei muito feliz com o resultado.

Acredito que #OCR funcione melhor quando fica bem amarrado com o documento digitalizado (por exemplo, tornando um arquivo PDF buscável), mas para extração de texto, ainda é um grande "depende".

Na minha curta jornada, testei #Tesseract e #Docling. Talvez funcione com código bem escrito, mas acabei me rendendo e indo "no muque" mesmo.

O Tesseract parece bem fácil de instalar no Linux (mesmo no #openSUSE Leap, que tem suas limitações por sair do SUSE empresarial, achei fácil), mas o Docling exigiu alguns malabarismos com ambientes em Python (usando conda e pip).

Para texto corrido, o Tesseract parece bem suficiente, já. Pode ser rodado via linha de comando e, pelo menos no openSUSE Leap, vários dicionários se encontram empacotados para facilitar.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst