Vectorization doesn't always speed things up because of SIMD math.
Sometimes it speeds things up because it forces you to overlap independent dependency chains.
We help development teams speed up their C/C++ software.
Performance-related blog: http://johnnysswlab.com
Direct help: http://johnnysswlab.com/consulting
Vectorization doesn't always speed things up because of SIMD math.
Sometimes it speeds things up because it forces you to overlap independent dependency chains.
π New on Johnnyβs Software Lab: Handling floating-point errors in C++ without killing performance!
NaNs, infinities, sticky bits, and traps β what works and whatβs a trap πͺ€.
Read more: https://johnnysswlab.com/floating-point-error-handling-in-c-what-actuall
π¬ Mailing list is live!
New articles + workshop dates (AVX, NEON) straight to your inbox.
π Go to johnnysswlab.com
β‘οΈ Enter your email in the box on the right
Still thinking #Java is slow? A deep dive into Java vs C++ performance will show what are its strengths and what are its weaknesses.
https://johnnysswlab.com/deep-dive-in-java-vs-c-performance/
9 Things Every Fresh Graduate Should Know About Performance
https://johnnysswlab.com/9-things-every-fresh-graduate-should-know-about-software-performance/
We investigate vector functions, more specifically, how to make your vector function available to the compiler's autovectorizer!
#vectorfunctions #simd #openmp #omp
https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/
Does it matter if we are compiling with optimizations off (O0) or optimizations on (O3) if the problem is memory bound? Letβs find outβ¦
#optimizations #performance #instructionlevelparallelism #ilp #compiler #gcc #memorybound
https://johnnysswlab.com/an-optimizing-compiler-doesnt-help-much-with-long-instruction-dependencies/
Last chance to register for AVX vectorization workshop!
More info: https://johnnysswlab.com/avx-neon-vectorization-workshop/
Register: info@johnnysswlab.com
A post on how to grow data buffers without memcpy:
https://johnnysswlab.com/growing-buffers-to-avoid-copying-data/
Your program doesnβt run fast enough? You need someone to talk to about your softwareβs performance? You or your team want to learn to write faster software? Whatever it is, we can help you. Check out the consulting page for more info.
Another AVX workshop, this time 4 half-days.
Registration or inquiry: info@johnnysswlab.com
More info: https://johnnysswlab.com/avx-neon-vectorization-workshop/
We debug a performance problem by simulating it on the CPU with #llvm-mca, part of @llvmorg .
Have a sneak-peak at what CPU does wile it is executing your program!
https://johnnysswlab.com/performance-debugging-with-llvm-mca-simulating-the-cpu/
Register now or express interest: info@johnnysswlab.com
More info: https://johnnysswlab.com/avx-neon-vectorization-workshop/
You can find the full interactive example here (notice, when the example opens, you will need to switch the active thread from `perf` to `ffmpeg` from the top center part of the screen and also pick `Left Heavy` in the top left part).
In the above picture, the horizontal bars are function names and the wider the bar - the more time is spent there.
Stacks represent the call hierarchy: functions that call other functions stack on top of each other.
It's very easy to see what functions spend the most time.
Are you familiar with flamegraphs? Flamegraphs are great way to visualize where your program is spending time or other resources.
Here is a screenshot of flamegraph taken from ffmpeg converting a movie from one type to another.
Everything you need to speed up your memory intensive program:
After more than 1 year of development, we are proud to announce:
πΆππ πππππ π½ππππππππππππ πΎππππππ
The vectorization workshop is a 2 day event that teaches you how to make your software faster by using NEON vectorization extensions for ARM CPUs and AVX vectorization extensions for Intel and AMD CPUs.
We start lightly and move on to explain more difficult concepts needed for efficient vectorization.
More info and registration here:
https://johnnysswlab.com/avx-neon-vectorization-workshop/
In this post we investigate methods to speed up convergence loops β while loops that slowly converge to the correct result.
https://johnnysswlab.com/speeding-up-convergence-loops-or-on-vectorization-and-precision-control/
In this post we talk about memory mechanism that increase memory access latency and we explore the techniques to avoid them.