Lmst

📉 So, it turns out #ONNX and #CoreML have a sneaky habit of downgrading your models to #FP16 without so much as a polite cough. 🤦‍♂️ But don't worry, there's a hero's journey through a forest of matrices and formats to fix this *not-a-bug*. Design choices, amirite? 😂
https://ym2132.github.io/ONNX_MLProgram_NN_exploration #ModelDowngrade #DataScience #HackerNews #ngated

ONNX Runtime and CoreML May Silently Convert Your Model to FP16

https://ym2132.github.io/ONNX_MLProgram_NN_exploration

#HackerNews #ONNX #Runtime #CoreML #FP16 #ModelConversion #AIdevelopment

Can you claim to be a real #HPC software engineer if you've never coded with at least 2 of #Fortran, #MPI, #OpenMP, or #CUDA?

Can you claim to be a modern #supercomputing scientist if you've never worked with at least 2 of #cloud, #AI/#ML, #FP16, or #RSEs?

🐢 Breaking news: A team of 🧙‍♂️ #wizards has magically discovered that #AMD #GPUs can handle something called "Matrix #Core Programming" with a little pixie dust called #FP16, #FP8, and #FP4. Who knew? 🤯 Get ready to revolutionize the universe... or just your local coffee shop's spreadsheet calculations. ☕📈
https://salykova.github.io/matrix-cores-cdna #Matrix #Programming #HackerNews #ngated

Темные лошадки ИИ – инференс LLM на майнинговых видеокартах Nvidia CMP 50HX, CMP 90HX

Теоретическая производительность майнинговых карт весьма высока, но синтетические тесты показывают, что они в 10 раз слабее игровых - где же правда? На практике с LLM они оказались на уровне RTX 2060/3060. Эта статья для тех, кто хочет сделать дешёвый LLM-сервер и любителей хардкорных экспериментов. Так что же они могут?

https://habr.com/ru/articles/940226/

#ollama #llm #fp16 #nvidia #cmp #50HX #90HX #майнинг #искусственный_интеллект #lm_studio

#JackDongarra Makes a Stand for Traditional #HPC: "US still doesn’t have a clear, long-term plan for what comes next.... U.S. risks falling behind."

Challenges to high-performance computing threaten #US #innovation

The #AI boom has led chip makers to focus on #FP16 and #FP8, not the #FP64 used by scientific research. If chip companies stop making the parts that #scientists need, then it could become harder to do important research.
https://theconversation.com/challenges-to-high-performance-computing-threaten-us-innovation-255188

@python I measured peak ~1.2GLUPs/s with #FP16S memory compression, 67% efficient regarding 136GB/s RAM bandwidth (8533 MT/s). That makes #Intel Lunar Lake 140V 1.7x faster than the Meteor Lake 185H iGPU. It's about on par with #IntelArc A380, RX 6500 XT, GTX 1050M Ti. Very cool to see an iGPU finally be competitive with entry level discrete #GPUs!
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks

FP32, FP16, BF16 и FP8 — разбираемся в основных типах чисел с плавающей запятой

Привет, Хабр! Сегодня давайте поговорим о том, как современные вычисления на GPU стали более гибкими и эффективными благодаря различным форматам чисел с плавающей запятой ( FP64 , FP32 , FP16 , BFLOAT16 и FP8 ). Эти форматы не просто числа — за каждым из них стоит конкретная область применения. В разных ситуациях мы сталкиваемся с задачами, где важны либо скорость, либо точность, и правильно выбранный тип floating point помогает оптимизировать ресурсы. Давайте разберём всё это на примерах и поймём, в каких задачах каждый из этих форматов будет наиболее полезен.

https://habr.com/ru/companies/serverflow/articles/847068/

#FP16 #fp32 #FP64 #BF16 #floating_point #плавающая_запятая #fp8 #числа_с_плавающей_запятой #формат_с_плавающей_запятой

Малые числа, большие возможности: как плавающая запятая ускоряет ИИ и технологии

Привет, Хабр! С вами снова ServerFlow, и сегодня мы решили погрузиться в увлекательный мир чисел с плавающей запятой . Вы когда-нибудь задумывались, почему существуют разные виды этих чисел и как они влияют на производительность наших процессоров и видеокарт? Как малые числа с плавающей запятой помогают развивать нейросети и искусственный интеллект? Давайте вместе разберемся в этих вопросах, раскроем тайны стандарта IEEE 754 и узнаем, какое значение имеют большие и маленькие числа с плавающей запятой в современных вычислениях.

https://habr.com/ru/companies/serverflow/articles/846732/

#плавающая_запятая #fp32 #fp16 #INT8 #квантизация #Тензорные_ядра #fpu #floatingpoint #floating_point #ieee_754

@Methylzero I had an idea last year around adding an extension to use the #FP16 FPUs as 10 bit int pipelines to save a cycle on IFMAs and I16ADD over the int16 MAC/add instructions, but they were seen as too niche (even for x86)

There was already precedent on this sort of thing (avx512 IFMA did this for the FP64 pipes)

Idea was saving a cycle (3.5 instead of 4.5) and saving some power (but not dealing with the extra 6 bits of a normal int16)

#simd #HPC

I've finally patched/enabled #FP16 vector arithmetic support for my #OpenCL-Benchmark on Nvidia #GPUs that support it with Nvidia's NVVM-7.0-updated drivers. That is Pascal, Volta, Turing, Ampere, Ada, Hopper, Blackwell and future.
Interesting find: Nvidia Ada has cut FP16 vector throughput in half, to only 1:1 FP16:FP32 ratio instead of 2:1. And A100 has 4:1 ratio.
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.5

RT by @EU_HaDEA: 🚀🚀🚀New release!🚀🚀🚀
Awesome half-float #FP16 support by #TornadoVM for AI/ML!
Great work by the team!
@UKRI_News
@EU_HaDEA

🐦🔗: https://nitter.cz/CKotselidis/status/1752316555860365513#m

[2024-01-30 13:03 UTC]

🌗 Facebook開源Python框架AITemplate，將神經網絡轉換為高性能CUDA/HIP C++代碼
➤ 專為FP16 TensorCore（NVIDIA GPU）和MatrixCore（AMD GPU）推理而設計
https://github.com/facebookincubator/AITemplate
Facebook開源了Python框架AITemplate，該框架可以將神經網絡轉換為高性能CUDA/HIP C++代碼，專為FP16 TensorCore（NVIDIA GPU）和MatrixCore（AMD GPU）推理而設計。該框架具有優秀的後向兼容性、水平融合、垂直融合和內存融合等特點，並支持更全面的融合範圍。
+ 這個框架看起來非常有用，尤其是對於需要高性能推理的應用程序。我很期待看到更多的人使用它並貢獻代碼。
+ Facebook一直在推動人工智能技術的發展，這個框架的開源也體現了他們的開放和貢獻精神。希
#Python #神經網絡 #CUDA #HIP #NVIDIA GPU #AMD GPU #FP16 TensorCore #MatrixCore

GCC 11.4 arrived today (Yay!)

Still find it strange that while GCC-11 added support for all the weird #AMX instructions, as well as a native flag for #SPR, there was never any support (original or backported) for the #FP16 instructions.

at time of posting the GCC website is still being updated, but this link should eventually link to the public docs: https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/

Time for an #introduction!
I'm a young Canuck with interests/experience in #HPC, #Linux, #BLAS, #SYCL, #C, #AVX512, #Rust, heterogeneous compute & other such things.

Currently my personal projects are bringing #FP16 to the #OpenBLAS library, working to standardize what Complex domain BLAS FP16 kernels/implementations should look like, and making sure #SYCL is available everywhere.

I also write every now and again. Here's the tail of AVX512 FP16 on Alderlake
https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458

Was going through the Risc-V Vector ISA spec (as you do) and noticed this little gem:

Specifically the line "When 16-bit and 128-bit element widths are added, they will be also be treated as IEEE-754/2008-compatible values. "

Unless I'm miss interpreting this, is Risc-V indicating future *native* support for 128 bit integer and floating point?

On the other hand, because I'm that guy: GOSH DARN IT, WHY NOT SHIP FP16 AS PART OF V.1 😭
https://github.com/riscv/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf

#HPC #BLAS #RiscV #FP16 #ASM

whether floating-point is supported, and for which element widths, is determined by the specific vector extension. The current set of
extensions include support for 32-bit and 64-bit floating-point values. When 16-bit and 128-bit element widths are added, they will be also be treated as IEEE-754/2008-compatible values. Other floating-point formats may be supported in future extension.

Finally getting results with a little mixed precision exponential growth domain algorithm I've been working on to take advantage of different hardware capabilities on heterogeneous systems.

Being able to pre determine when a domain is entering an area where higher precision is needed dynamically, then exiting it back to lower precision dynamically without contaminating results isn't exactly trivial...

#HPC #fp16 #GPGPU #CFD #compute #programing #research

Ok, beyond posting *about* #mastodon, time to post *on* mastodon.
For those interested in #HPC, #CPU , #intel , #linux , #kernel development and other such things, this blog post/article from the other week may be of interest.

It chronicles what had already been a year in the making of #avx512 development, the trials and tribulations of dealing with vendors and the quest to bring reduced precision ( #fp16 ) to main stream #x86

Post here from my #github : https://gist.github.com/FCLC/56e4b3f4a4d98cfd274d1430fabb9458

#fp16

Client Info