#fp16

N-gated Hacker Newsngate
2025-12-22

📉 So, it turns out and have a sneaky habit of downgrading your models to without so much as a polite cough. 🤦‍♂️ But don't worry, there's a hero's journey through a forest of matrices and formats to fix this *not-a-bug*. Design choices, amirite? 😂
ym2132.github.io/ONNX_MLProgra

Hacker Newsh4ckernews
2025-12-22
Andrew Jones (hpcnotes)hpcnotes@mast.hpc.social
2025-11-26

Can you claim to be a real #HPC software engineer if you've never coded with at least 2 of #Fortran, #MPI, #OpenMP, or #CUDA?

Can you claim to be a modern #supercomputing scientist if you've never worked with at least 2 of #cloud, #AI/#ML, #FP16, or #RSEs?

N-gated Hacker Newsngate
2025-10-04

🐢 Breaking news: A team of 🧙‍♂️ has magically discovered that can handle something called "Matrix Programming" with a little pixie dust called , , and . Who knew? 🤯 Get ready to revolutionize the universe... or just your local coffee shop's spreadsheet calculations. ☕📈
salykova.github.io/matrix-core

2025-08-24

Темные лошадки ИИ – инференс LLM на майнинговых видеокартах Nvidia CMP 50HX, CMP 90HX

Теоретическая производительность майнинговых карт весьма высока, но синтетические тесты показывают, что они в 10 раз слабее игровых - где же правда? На практике с LLM они оказались на уровне RTX 2060/3060. Эта статья для тех, кто хочет сделать дешёвый LLM-сервер и любителей хардкорных экспериментов. Так что же они могут?

habr.com/ru/articles/940226/

#ollama #llm #fp16 #nvidia #cmp #50HX #90HX #майнинг #искусственный_интеллект #lm_studio

Benjamin Carr, Ph.D. 👨🏻‍💻🧬BenjaminHCCarr@hachyderm.io
2025-05-21

#JackDongarra Makes a Stand for Traditional #HPC: "US still doesn’t have a clear, long-term plan for what comes next.... U.S. risks falling behind."

Challenges to high-performance computing threaten #US #innovation

The #AI boom has led chip makers to focus on #FP16 and #FP8, not the #FP64 used by scientific research. If chip companies stop making the parts that #scientists need, then it could become harder to do important research.
theconversation.com/challenges

2024-10-31

@python I measured peak ~1.2GLUPs/s with #FP16​S memory compression, 67% efficient regarding 136GB/s RAM bandwidth (8533 MT/s). That makes #Intel Lunar Lake 140V 1.7x faster than the Meteor Lake 185H iGPU. It's about on par with #IntelArc A380, RX 6500 XT, GTX 1050M Ti. Very cool to see an iGPU finally be competitive with entry level discrete #GPU​s!
github.com/ProjectPhysX/FluidX

2024-09-30

FP32, FP16, BF16 и FP8 — разбираемся в основных типах чисел с плавающей запятой

Привет, Хабр! Сегодня давайте поговорим о том, как современные вычисления на GPU стали более гибкими и эффективными благодаря различным форматам чисел с плавающей запятой ( FP64 , FP32 , FP16 , BFLOAT16 и FP8 ). Эти форматы не просто числа — за каждым из них стоит конкретная область применения. В разных ситуациях мы сталкиваемся с задачами, где важны либо скорость, либо точность, и правильно выбранный тип floating point помогает оптимизировать ресурсы. Давайте разберём всё это на примерах и поймём, в каких задачах каждый из этих форматов будет наиболее полезен.

habr.com/ru/companies/serverfl

#FP16 #fp32 #FP64 #BF16 #floating_point #плавающая_запятая #fp8 #числа_с_плавающей_запятой #формат_с_плавающей_запятой

2024-09-28

Малые числа, большие возможности: как плавающая запятая ускоряет ИИ и технологии

Привет, Хабр! С вами снова ServerFlow, и сегодня мы решили погрузиться в увлекательный мир чисел с плавающей запятой . Вы когда-нибудь задумывались, почему существуют разные виды этих чисел и как они влияют на производительность наших процессоров и видеокарт? Как малые числа с плавающей запятой помогают развивать нейросети и искусственный интеллект? Давайте вместе разберемся в этих вопросах, раскроем тайны стандарта IEEE 754 и узнаем, какое значение имеют большие и маленькие числа с плавающей запятой в современных вычислениях.

habr.com/ru/companies/serverfl

#плавающая_запятая #fp32 #fp16 #INT8 #квантизация #Тензорные_ядра #fpu #floatingpoint #floating_point #ieee_754

2024-09-20

@Methylzero I had an idea last year around adding an extension to use the #FP16 FPUs as 10 bit int pipelines to save a cycle on IFMAs and I16ADD over the int16 MAC/add instructions, but they were seen as too niche (even for x86)

There was already precedent on this sort of thing (avx512 IFMA did this for the FP64 pipes)

Idea was saving a cycle (3.5 instead of 4.5) and saving some power (but not dealing with the extra 6 bits of a normal int16)

#simd #HPC

2024-08-18

I've finally patched/enabled #FP16 vector arithmetic support for my #OpenCL-Benchmark on Nvidia #GPU​s that support it with Nvidia's NVVM-7.0-updated drivers. That is Pascal, Volta, Turing, Ampere, Ada, Hopper, Blackwell and future.
Interesting find: Nvidia Ada has cut FP16 vector throughput in half, to only 1:1 FP16:FP32 ratio instead of 2:1. And A100 has 4:1 ratio.
github.com/ProjectPhysX/OpenCL

RT by @EU_HaDEA: 🚀🚀🚀New release!🚀🚀🚀
Awesome half-float #FP16 support by #TornadoVM for AI/ML!
Great work by the team!
@UKRI_News
@EU_HaDEA

🐦🔗: nitter.cz/CKotselidis/status/1

[2024-01-30 13:03 UTC]

GripNewsGripNews
2023-06-02

🌗 Facebook開源Python框架AITemplate,將神經網絡轉換為高性能CUDA/HIP C++代碼
➤ 專為FP16 TensorCore(NVIDIA GPU)和MatrixCore(AMD GPU)推理而設計
github.com/facebookincubator/A
Facebook開源了Python框架AITemplate,該框架可以將神經網絡轉換為高性能CUDA/HIP C++代碼,專為FP16 TensorCore(NVIDIA GPU)和MatrixCore(AMD GPU)推理而設計。該框架具有優秀的後向兼容性、水平融合、垂直融合和內存融合等特點,並支持更全面的融合範圍。
+ 這個框架看起來非常有用,尤其是對於需要高性能推理的應用程序。我很期待看到更多的人使用它並貢獻代碼。
+ Facebook一直在推動人工智能技術的發展,這個框架的開源也體現了他們的開放和貢獻精神。希
GPU GPU TensorCore

2023-05-29

GCC 11.4 arrived today (Yay!)

Still find it strange that while GCC-11 added support for all the weird #AMX instructions, as well as a native flag for #SPR, there was never any support (original or backported) for the #FP16 instructions.

at time of posting the GCC website is still being updated, but this link should eventually link to the public docs: gcc.gnu.org/onlinedocs/gcc-11.

2022-12-05

Time for an #introduction!
I'm a young Canuck with interests/experience in #HPC, #Linux, #BLAS, #SYCL, #C, #AVX512, #Rust, heterogeneous compute & other such things.

Currently my personal projects are bringing #FP16 to the #OpenBLAS library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure #SYCL is available everywhere.

I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake
gist.github.com/FCLC/56e4b3f4a

2022-11-30

Was going through the Risc-V Vector ISA spec (as you do) and noticed this little gem:

Specifically the line "When 16-bit and 128-bit element widths are added, they will be also be treated as IEEE-754/2008-compatible values. "

Unless I'm miss interpreting this, is Risc-V indicating future *native* support for 128 bit integer and floating point?

On the other hand, because I'm that guy: GOSH DARN IT, WHY NOT SHIP FP16 AS PART OF V.1 😭
github.com/riscv/riscv-v-spec/

#HPC #BLAS #RiscV #FP16 #ASM

whether floating-point is supported, and for which element widths, is determined by the specific vector extension. The current set of
extensions include support for 32-bit and 64-bit floating-point values. When 16-bit and 128-bit element widths are added, they will be also be treated as IEEE-754/2008-compatible values. Other floating-point formats may be supported in future extension.
FelixCLCFelixCLC
2022-11-12

Finally getting results with a little mixed precision exponential growth domain algorithm I've been working on to take advantage of different hardware capabilities on heterogeneous systems.

Being able to pre determine when a domain is entering an area where higher precision is needed dynamically, then exiting it back to lower precision dynamically without contaminating results isn't exactly trivial...

FelixCLCFelixCLC
2022-10-30

Ok, beyond posting *about* , time to post *on* mastodon.
For those interested in , , , , development and other such things, this blog post/article from the other week may be of interest.

It chronicles what had already been a year in the making of development, the trials and tribulations of dealing with vendors and the quest to bring reduced precision ( ) to main stream

Post here from my : gist.github.com/FCLC/56e4b3f4a

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst