Johnny's Software Lab

We help development teams speed up their C/C++ software.
Performance-related blog: johnnysswlab.com
Direct help: johnnysswlab.com/consulting

Johnny's Software Labjohnnysswlab@mastodon.online
2026-02-26

Vectorization doesn't always speed things up because of SIMD math.

Sometimes it speeds things up because it forces you to overlap independent dependency chains.

New post:
johnnysswlab.com/exposing-more

Johnny's Software Labjohnnysswlab@mastodon.online
2026-02-18

πŸš€ New on Johnny’s Software Lab: Handling floating-point errors in C++ without killing performance!
NaNs, infinities, sticky bits, and traps β€” what works and what’s a trap πŸͺ€.

Read more: johnnysswlab.com/floating-poin

#Cpp #Performance #FloatingPoint #NaN #Infinity

Johnny's Software Labjohnnysswlab@mastodon.online
2026-02-06

πŸ“¬ Mailing list is live!
New articles + workshop dates (AVX, NEON) straight to your inbox.

πŸ‘‰ Go to johnnysswlab.com
➑️ Enter your email in the box on the right

Johnny's Software Labjohnnysswlab@mastodon.online
2025-12-06

Still thinking #Java is slow? A deep dive into Java vs C++ performance will show what are its strengths and what are its weaknesses.

johnnysswlab.com/deep-dive-in-

#java #javaperformance #garbagecollector #jvm

Johnny's Software Labjohnnysswlab@mastodon.online
2025-09-20
Johnny's Software Labjohnnysswlab@mastodon.online
2025-07-04

We investigate vector functions, more specifically, how to make your vector function available to the compiler's autovectorizer!

#vectorfunctions #simd #openmp #omp

johnnysswlab.com/the-messy-rea

Johnny's Software Labjohnnysswlab@mastodon.online
2025-05-31

Does it matter if we are compiling with optimizations off (O0) or optimizations on (O3) if the problem is memory bound? Let’s find out…

#optimizations #performance #instructionlevelparallelism #ilp #compiler #gcc #memorybound

johnnysswlab.com/an-optimizing

Johnny's Software Labjohnnysswlab@mastodon.online
2025-05-14

Last chance to register for AVX vectorization workshop!

More info: johnnysswlab.com/avx-neon-vect
Register: info@johnnysswlab.com

Johnny's Software Labjohnnysswlab@mastodon.online
2025-03-31

A post on how to grow data buffers without memcpy:

johnnysswlab.com/growing-buffe

Your program doesn’t run fast enough? You need someone to talk to about your software’s performance? You or your team want to learn to write faster software? Whatever it is, we can help you. Check out the consulting page for more info.

johnnysswlab.com/consulting/

Johnny's Software Labjohnnysswlab@mastodon.online
2025-02-05

Another AVX workshop, this time 4 half-days.

Registration or inquiry: info@johnnysswlab.com

More info: johnnysswlab.com/avx-neon-vect

Johnny's Software Labjohnnysswlab@mastodon.online
2025-01-31

We debug a performance problem by simulating it on the CPU with #llvm-mca, part of @llvmorg .

Have a sneak-peak at what CPU does wile it is executing your program!

johnnysswlab.com/performance-d

Johnny's Software Labjohnnysswlab@mastodon.online
2025-01-13

Register now or express interest: info@johnnysswlab.com
More info: johnnysswlab.com/avx-neon-vect

Johnny's Software Labjohnnysswlab@mastodon.online
2025-01-02

You can find the full interactive example here (notice, when the example opens, you will need to switch the active thread from `perf` to `ffmpeg` from the top center part of the screen and also pick `Left Heavy` in the top left part).

speedscope.app/#profileURL=htt

Johnny's Software Labjohnnysswlab@mastodon.online
2025-01-02

In the above picture, the horizontal bars are function names and the wider the bar - the more time is spent there.

Stacks represent the call hierarchy: functions that call other functions stack on top of each other.

It's very easy to see what functions spend the most time.

Johnny's Software Labjohnnysswlab@mastodon.online
2025-01-02

Are you familiar with flamegraphs? Flamegraphs are great way to visualize where your program is spending time or other resources.

Here is a screenshot of flamegraph taken from ffmpeg converting a movie from one type to another.

Johnny's Software Labjohnnysswlab@mastodon.online
2024-11-24

Everything you need to speed up your memory intensive program:

johnnysswlab.com/memory-subsys

Johnny's Software Labjohnnysswlab@mastodon.online
2024-11-11

After more than 1 year of development, we are proud to announce:

𝑢𝒖𝒓 π‘­π’Šπ’“π’”π’• π‘½π’†π’„π’•π’π’“π’Šπ’›π’‚π’•π’Šπ’π’ π‘Ύπ’π’“π’Œπ’‰π’π’‘

The vectorization workshop is a 2 day event that teaches you how to make your software faster by using NEON vectorization extensions for ARM CPUs and AVX vectorization extensions for Intel and AMD CPUs.

We start lightly and move on to explain more difficult concepts needed for efficient vectorization.

More info and registration here:
johnnysswlab.com/avx-neon-vect

Johnny's Software Labjohnnysswlab@mastodon.online
2024-08-31

In this post we investigate methods to speed up convergence loops – while loops that slowly converge to the correct result.

johnnysswlab.com/speeding-up-c

Johnny's Software Labjohnnysswlab@mastodon.online
2024-06-28

In this post we talk about memory mechanism that increase memory access latency and we explore the techniques to avoid them.

johnnysswlab.com/latency-sensi

#memoryaccesseslatency #lowlatency #latencysensitivesystems

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst