Lmst

📰 Python Data Processing 2026: Deep Dive into Pandas, Polars, and DuckDB

Stop waiting for your CSVs to load. Learn how Pandas 2.x, Polars, and DuckDB are revolutionizing tabular data processing with Apache Arrow in 2026.

#csv #excel #dataformats #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/python-data-processing-2026-deep-dive-into-pandas-polars-and-duckdb-119?utm_source=mastodon&utm_medium=social&utm_campaign=blog

📰 TOML vs INI vs ENV: Why Configuration is Still Broken in 2026

Stop using .env files for secrets. Discover why INI, TOML, and ENV variables are failing modern dev teams in 2026 and how to fix your config workflow now.

#config #dataformats #devops #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/toml-vs-ini-vs-env-why-configuration-is-still-broken-in-2026-ap8?utm_source=mastodon&utm_medium=social&utm_campaign=blog

📰 Zod vs JSON Schema: Why 2026 is the Year of Type-Safe Data Contracts

Stop guessing your data types. Discover how Zod v4, JSON Schema Draft 2020-12, and TypeBox are revolutionizing API safety and performance in 2026.

#validation #typescript #dataformats #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/zod-vs-json-schema-why-2026-is-the-year-of-type-safe-data-contracts-w0a?utm_source=mastodon&utm_medium=social&utm_campaign=blog

📰 ELK Stack vs OpenTelemetry: The Ultimate Guide to Log Parsing in 2026

Is your observability platform a house of cards? Learn the truth about schema drift, AI-driven anomalies, and the real cost of log storage in 2026.

#logging #devops #dataformats #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/elk-stack-vs-opentelemetry-the-ultimate-guide-to-log-parsing-in-2026-2bz?utm_source=mastodon&utm_medium=social&utm_campaign=blog

📰 JSON vs JSON5 vs YAML: The Ultimate Data Format Guide for 2026

Master the evolution of JSON, JSON5, and YAML in 2026. Learn how to handle massive payloads, avoid YAML bombs, and implement strict schema validation today.

#json #dataformats #standards #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/json-vs-json5-vs-yaml-the-ultimate-data-format-guide-for-2026-fpl?utm_source=mastodon&utm_medium=social&utm_campaign=blog

🛒 Ah, the modern marvel of squishing #IKEA into a flat text file! Because who doesn't want the thrill of assembling their own dataset from a jumble of words? 😂 #JSON was obviously too roomy, so they went ahead and invented 'CommerceTXT' - the IKEA of data formats. 🏗️📦
https://huggingface.co/datasets/tsazan/ikea-us-commercetxt #DataFormats #CommerceTXT #AssemblyThrill #ModernMarvel #HackerNews #ngated

JSON Handling: Which Language Makes It Easier?!

JavaScript JSON vs PHP JSON - native vs functions, which wins? SURPRISING results!

#javascript #php #jsvsphp #json #jsonparsing #serialization #jsonencode #jsondecode #viralcoding #dataformats #programmingcomparison #mindblown

https://www.youtube.com/watch?v=K6QzMpuBY7s

📰 Zod vs Yup vs TypeBox: The Ultimate Schema Validation Guide for 2025

Stop guessing your data's shape. Master Zod, Yup, and TypeBox to build bulletproof, type-safe TypeScript applications in 2025. Learn the latest features now.

#validation #typescript #dataformats #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/zod-vs-yup-vs-typebox-the-ultimate-schema-validation-guide-for-2025-whd?utm_source=mastodon&utm_medium=social&utm_campaign=blog

📰 JSON vs YAML vs JSON5: The Truth About Data Formats in 2025

Is YAML a security nightmare? Discover the truth about JSON, JSON5, and YAML in 2025. Learn why your choice of data format impacts performance and safety.

#json #dataformats #standards #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/json-vs-yaml-vs-json5-the-truth-about-data-formats-in-2025-ewv?utm_source=mastodon&utm_medium=social&utm_campaign=blog

Dear Archive Bubble,
are there any journals and magazines I should be aware of when it comes to news about (digital) archiving?
• Also data formats, metadata, events, etc.
• Journals from Europe

Feel free to share and thanks!

#archives #dataformats #archivistodon https://berlin.social/@aoe/115581754051548575

Liebe Archiv-Bubble.
Gibt es Fachzeitschriften, die ich kennen sollte, wenn es um Nachrichten rund um das Thema (digitale) Archivarbeit geht?
• Auch Datenformate, Metadaten, Events etc.
• Zeitschriften aus D-A-CH aber auch englischsprachige Journale aus dem europäischen Ausland.

Gerne boosten - Danke!
#Archiv #Archive #archives #dataformats #archivistodon

What If OpenDocument Used SQLite?

https://www.sqlite.org/affcase1.html

#HackerNews #OpenDocument #SQLite #SQLiteDatabase #TechInnovation #DataFormats

🎉 Behold, the groundbreaking revelation: #Xz is not the Holy Grail of data formats! 🚀 Apparently, using xz for digital preservation is like using a sieve as a bucket—bound to fail. Who knew? 🤦‍♂️ Stick to #bzip2, #gzip, or #lzip if you want actual functionality and avoid sinking your data into the abyss of inadequacy. 🔍💾
https://www.nongnu.org/lzip/xz_inadequate.html #dataformats #digitalpreservation #HackerNews #ngated

Ah, the thrilling world of Parquet file formats, where the only thing more riveting than the two versions is the general #apathy towards updating them. 🤦‍♂️ Apparently, SQL engines have taken on the role of format overlords, ensuring progress is locked away tighter than my interest in reading another tech blog post. 🙄
https://www.jeronimo.dev/the-two-versions-of-parquet/ #ParquetFiles #SQLEngines #TechUpdate #DataFormats #HackerNews #ngated

Shane O’Sullivan: Search Huge JSON files on the Web. “Working with very large JSON files (20MB+) using online tools tends to be a crashy affair. Whether you’re looking to format or search them, all the tools I found just crash. I found myself having to work with huge JSON files recently, so I built a tool specifically optimized for huge JSON files, called Huge JSON Viewer.”

https://rbfirehose.com/2025/06/19/shane-osullivan-search-huge-json-files-on-the-web/

On File Formats

https://solhsa.com/oldernews2025.html#ON-FILE-FORMATS

#HackerNews #On #File #Formats #technology #discussion #dataformats #filetypes #softwareengineering

GSPy - A New Toolbox And Data Standard For Geophysical Datasets
--
https://doi.org/10.3389/feart.2022.907614 <-- shared paper
--
https://doi.org/10.5066/P9XNQVGQ | https://code.usgs.gov/g3sc/gspy <-- shared code repository
--
[an older paper, but code is in active and ongoing development/evolution]
#GIS #spatial #mapping #geophysics #geophysical #NetCDF #datatypes #code #opensource #library #dataformats #standardisation #standardization #openstandard #portable #metadata #Python #package #GSPy #methods #workflows #xarray #CRS #opendata #architecture #toolbox

Conceptual diagram of GSPy workflow. Data from a variety of formats and types are read into GSPy, along with required metadata files. Through the GSPy software, data are converted into a standardized NetCDF file containing the dataset and metadata appropriate for archiving and sharing.

GS data convention. (A) Datasets are structured into three fundamental group types based on content and data geometry. The Survey group contains general metadata about the dataset. Unstructured datasets, such as from CSV or TXT files, form Tabular groups, whereas structured (gridded) datasets are categorized under the Raster group. Metadata is attached to all groups, with various required attributes (green text) that expands on the CF-1.8 convention. (B) Groups follow a strict hierarchy in the NetCDF file, with a single Survey group at the top to which all data groups are attached. Datasets are indexed within their respective group type. (C) Tabular and Raster data groups must contain clearly defined dimensions, such as index or x, y, z, as well as coordinate variables. Raster groups are distinct in that dimensions are also coordinates, whereas Tabular datasets are assigned spatial coordinates that align with the index dimension. Lastly, the coordinate variable “spatial_ref” is required for all data groups, which expands on the “coordinate_information” variable required in the Survey metadata.

photo - rigs preparing to do a seismic survey, Middle East

GSPy code base - Writing and plotting examples. Once all groups have been attached to a Survey, the “write_netcdf” and “write_ncml” methods will write the GS NetCDF and NcML files, respectively. GSPy also provides methods to generate scatter and pcolor plots for variables.

https://mastodon.edufor.me/@schmittlauch@toot.matereal.eu/111566947042656458

Maybe #FAIR ness in research data should extend to #FAIR ness in other forms of data we generate and consume in our daily lives?

#FAIRData #OpenData #DataFormats #DeustcheBahn #DB and maybe even #NFDI #NFDIRocks ?

On Paperwork vs. Digital Formats

tired: Our customer's paperwork is profit. Our own paperwork is loss.[1]

wired: Your proprietay data format is loss. Our proprietary data format is profit.

I'd remembered the first aphorism from a long-ago collection of Murphy's Laws.

Thinking through my struggles at organising online and digital media, references, etc., I realised that a huge problem is that these formats don't serve my goals. They're designed far more around their authors' goals, or even more often, the publishers' goals, largely around advertising, marketing, tracking, building lock-in, creating and defending monopolies, and the like.

Digital formats that are in the end-user's interest and specification serve the user. Those that are in the publisher's specification serve the publisher.

A related thought is that a key affordance of printed periodicals (newspapers, magazines, journals) is that of garbage collection, to put a contemporary spin on it.

When you're done reading a newspaper or magazine, you pick up the whole lot and throw it out. There's an intermediate level of organisation other than "the article" and "the whole collection" (that is, everything published in your office or home), "the issue". (Or perhaps a box or shelf of archived media.) That is, _there are multiple naturally-occurring levels of aggregation.)

When you're trying to sort through a set of browser tabs, you generally have only two levels of aggregation: the individual tab, or the entire session. There are typically no intermediate levels, and sorting through what you want to keep (or re-read, or work with) means you've got to go through the set one at a time and resolve disposition. The data format serves the browser vendor, but not the user.

Tools such as Tree-Style Tabs, an absolutely essential Firefox extension, give a higher level of natural organisation, the tab tree. Here, a structure emerges, without user effort, of related content. At the top of the tree is whatever page began an exploration, and as you descend it, you go further down into the search. When cleaning up, it's possible to pick any given tab, branch, or whole tree, and close it out in one fell swoop. Garbage collection costs are reduced.

(Three guesses as to what I've been attempting to do, and the first two don't count.)

#media #paperwork #DigitalMedia #DigitalFormats #FileFormats #DataFormats #kfc #docfs #UserCentricDesign #TreeStyleTabs

https://www.openhub.net/p/gdal# #GDAL is a translator library for #raster #geospatial #dataformats.

#dataformats

Client Info