#OpenWebSearchEU

Arjen P. de Vries Timmers ๐Ÿ•Š๏ธarjen@idf.social
2025-12-17

#ECIR2026 notifications were friendly to me ๐Ÿค—

1. Full paper "Open Web Indexes for Remote Querying" with @gijs and @djoerd.

Can we let ppl query the Terabytes of Web Index we collect in #OpenWebSearchEU in new ways, making good use of Parquet, S3, DuckDB?

Turns out the answer is a big YES!

Pre-print of the paper w/ code coming soon!

1/4

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2025-03-31
Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2025-03-26

Open web index #OWI update:

4 billion URLs crawled
185 different languages
28 million Hosts
750 TB crawled
1 TB crawled per day
147 WARC Datasets
17.5 TB size of Open Web Index
28.8 TB size of WARC datasets
346 public datasets

#OpenWebSearchEU #OpenWebSearch

ows.eu

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-10-07

Today, Open Web Search consortium meeting at LRZ. #OpenWebSearchEU

Foto of entrance: Leipniz Rechenzentrum
Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-07-01

@heinragas Thanks for offering! #OpenWebSearchEU provides the data and index. I hope there will be a fully functional search engine at the end of the project. (build by anyone!) @openwebsearcheu @Negin

Arjen P. de Vries Timmers ๐Ÿ•Š๏ธarjen@idf.social
2024-04-17

Don't be evil. #OpenWebSearchEU

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-03-13

This week: #OpenWebSearchEU consortium meeting in Nijmegen!

Michael Granitzer opens the meeting
Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-02-21

#OpenWebSearchEU has funding for organizations that build search engines using the open search infrastructure

openwebsearch.eu/third-party-c

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-01-26

Frequently Asked Questions #FAQ about #OpenWebSearchEU

openwebsearch.eu/faqs/

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2024-01-22

Another #ECIR2024 preprint online: "The Open Web Index: Crawling and Indexing the Web for Public Use". #OpenWebSearchEU

djoerdhiemstra.com/2024/the-op

Open Search Foundationosf@suma-ev.social
2023-12-19

โญ๏ธ OpenSearch #AdventCalendar | Day 19
ows.eu โ€“ Looking back & looking forward

As the year is slowly coming to an end and we are ready to wind down a bit, it's time for a little year in review of our OpenWebSearch.eu project.

A quick reminder: We started the Horizon EU project in September 2022 with 13 other consortium partners.

Read about the yearยดs highlights in todayยดs advent calendar post:
opensearchfoundation.org/en/ad

@openwebsearcheu
#OpenWebSearch #OpenWebSearchEU #OpenwebSearchCommunity

Key Visual for the #OpenSearchAdventCalendar 2023 - Day 19
Arjen P. de Vries Timmers ๐Ÿ•Š๏ธarjen@idf.social
2023-12-09

Ethical, open and non-commercial: the Open Web Search project is designed to provide Europe with the right alternative to existing search engines

home.cern/news/news/computing/

#OpenWebSearchEU

Arjen P. de Vries Timmers ๐Ÿ•Š๏ธarjen@idf.social
2023-11-27

Woke up at 6am to cycle through the pouring rain to the train station and miss my train by half a minute or so... Oh well.

On my way to the *Dutch-Belgian Information Retrieval workshop*, in Delft this year. Looking forward to spending the day with colleagues and getting updated on all the ongoing IR research "nearby".

dir2023.github.io/DIR2023/

In the afternoon, I stand in for @djoerd who unfortunately cannot make it (Covid ๐Ÿ™), and will present the talk on #OpenWebSearchEU research project.

Open Search Foundationosf@suma-ev.social
2023-11-21

The first third-party funded project partners are on board!
They were selected from the first open call in spring and will enrich the #EU #horizonEurope project OpenWebSearch.eu with their expertise.
A warm welcome!

More info: openwebsearch.eu/third-parties

#OpenWebSearch #OpenWebSearchCommunity #OpenWebSearchEU #HorizonEurope
@EC_NGI

Keyvisual with text: Third-parties, welcome on board of OpenWebSearch.eu!
2023-11-18

Busy event week!
๐Ÿ‘‰ #SC23 in Denver with #OpenWebSearchEU partners @LRZ_DE + IT4Innovations
๐Ÿ‘‰ major panel on #OpenWebSearch + #LLMs at #NGIForum23 in Brussels , moderated by @osf's board member @frauplote and with OpenWebSearch.eu's principal investigator @grani of Universitรคt Passau in the panel of experts. @alnuss explained the basic ows.eu ethics methodology at a workshop on #EthicsinSearch. Plus lots of good talks + interesting meetings with members of fellow @EC_NGI projects!

2023-11-15

๐Ÿš€ #NGIForum23 has kicked off! For the next two days technologists, innovators + researchers will meet in Brussels to discuss the Next Generation Internet โ€“ an Internet of Trust for European Citizens + Businesses. Lots to talk about + lots to think about + plan!

#OpenWebSearchEU is represented by Stefan Voigt, @grani @alnuss, @frauplote and Megi (Maggie) Sharikadze, among others.

Also some representatives of our NGI "sister" projects are here on site. Good to meet you here in person!

NGI Forum 2023NGI Forum 2023NGI Forum 2023
2023-11-03

Chorizo alarm for the weekend!
What does a slice of chorizo have to do with open #websearch? Dr. Tim Smith from #OpenWebSearchEU's project partner CERN shows the added value of an #OpenWebIndex for #internetsearch in his TED Talk "Navigating the Entangled Web" using examples from physics.
Highly recommended!

โฌ‡๏ธ ted.com/talks/tim_smith_naviga

TED Talk Tim Smith, CERN
2023-10-19

Hello, live from the ows.eu consortium meeting in Ostrava! We are the #OpenWebSearchEU initiative, having joined forces to promote Europeโ€™s independence in Web Search.
The goal: to develop an open European infrastructure for web search. Our initiative will be contributing to Europeโ€™s digital sovereignty as well as promoting an open human-centered search engine market.

Since today we are โ€“ finally โ€“ on Mastodon. Nice to be here!

#neuhier #newhere

OpenWebSearch.eu members at the consortium meeting at IT4I in Ostrava
Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2023-10-18

Some incomplete list of publications that analyzed the #CommonCrawl data to identify malicious content are:

Srdjan Matic, et atl. Identifying sensitive URLs at web-scale, in Proceedings of the ACM Internet Measurement Conference, 2020, pp. 619โ€“633.

gsmaragd.github.io/publication

#OpenWebSearchEU

Djoerd Hiemstra ๐Ÿ‰djoerd@idf.social
2023-10-18

#OWLer is the #OpenWebSearchEU collaborative web crawler. Do we crawl everything?

openwebsearcheu.pages.it4i.eu/

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst