marco

I like computers, Korean, and computers-and-Korean.

Language and keyboard stuff @Google and PhD student in Tokyo.
Georgia Tech -> 연세대학교 -> 東京工業大学.

Interested in high school CS education.

Academic:
- #ML
- #NLProc
- #Automata
- #CJK
- #Tokenization
- #FederatedLearning
- #Compression

Programming:
- #JuliaLang
- #Python
- #OSS

Other:
- High School CS #Education
- #Korean
- #Keyboards
- #Esperanto
- #Chess
- #Coffee

2025-07-05

Any recs for screencasts of people building with Claude Code/Cursor/Gemini CLI/etc?

Doesn't have to be anything complex or long, just clearly showing workflow/interaction/etc.

2025-03-08

@pbloem In some sense, I agree that character-level would solve a lot of our problems. But character-level models have low information density and, at least with current architectures, are too costly and slow.

Subword tokenization is definitely an imperfect solution, but improves upon both of those problems for most (but ofc then has the downsides of like "has trouble spelling/doing basic math", etc.).

I hope to see a lot more research on tokenization-free methods like BLT.

2025-02-20

@drgroftehauge yep, I guess what I normally do is not quite code golf though. I try to find one-liners, but they don't necessarily have the fewest number of characters (the real goal of golfing).

2025-02-20

This problem came up again, so I updated my old solution!

sigmoid.social/@mc/11166258191

marco boosted:
2025-02-20

A few weeks ago I got bored and tried solving a leetcode problem "at random". Basically, I wanted to find a way to set a random seed such that randomly generating an answer solved all the test cases for the following problem: leetcode.com/problems/find-uni.

Through a little trial and error, I managed to solve it. Check out the writeup!

mcognetta.github.io/posts/leet

#python #leetcode #codegolf #programming #MaliciousCompliance

2025-02-11
2025-02-11

Tokenization is an often-overlooked aspect of modern #NLP, but it’s experiencing a resurgence — thanks in large part to @karpathy and his classic tweet:

x.com/karpathy/sta...

Come hang out with us and let's fix these problems!

2025-02-11

Today we are launching a server dedicated to Tokenization research! Come join us!

discord.gg/CDJhnSvU

#nlproc #machinelearning #tokenization

2024-10-04

Gboard never stops innovating.

youtu.be/EHqPrHTN1dU

#keyboard #jp

marco boosted:
2024-08-13

!!Con 2024 is just three weeks away! Join us for two days of ten-minute talks about the joy, excitement, and surprise of computing!

🎟️ Get tickets and learn more: bangbangcon.com

!!Con 2024
August 24-25, 2024

Santa Cruz, California
and online

Two days of ten-minute talks about
the joy, excitement, and surprise of computing

Let’s make local and accountable tech! (Dawn Walker) • Reverse-engineering a 30-year old synthesizer to perfectly recreate video game music! (Peter Sobot) • Programming with only exceptions! (Nicole Tietz-Sokolskaya) • Mitos! Handweaving My Ancestral DNA!!! (Sally Kong) • Let’s run a tiny chess neural network by hand! (Amédée d’Aboville) • The Astrolabe! Using modern digital computing to recreate ancient analog computers (Jes Wolfe) • A brief history of keyboards! (Jesse Chen) • Recreating Sketchpad, the first GUI! (Adam Solove) • Images from a 1970s Typewriter!! (Phil Warren) • bang! bang! he murdered math! {the musical!} (Taylor Troesh) • Huggable Data! Making the Ephemeral Last Longer with Textile Dataviz! (Aldís Elfarsdóttir) • Keyboarding Ain’t Easy?? What Not To Do When Building a Keyboard!! (Liz Frost) • The Perfect Blend!! Reverse engineering a bluetooth protocol for better smoothies! (Ryan Mast) • How to make your own microchip!!! (Omar Rizwan) • Grading your Types? From Bangs to Boxes and Beyond! (Danielle Marshall) • Let’s find random things on the street with full-text search! (Yufeng Zhao) • It wasn’t me, it was the cosmic rays! Blaming physics for our evil actions! (Matías Lang) • and many more!

learn more and get tickets: bangbangcon.com
2024-07-02

@sharif Ah, for keystroke-level golf I agree. But I prefer to do line-level golf, where I pack everything into a giant one liner (so ";" is disallowed). It's a lot of fun.

There is a workaround (that I don't particularly love) for this where you can get one line:

```
(x := [1,2,3], [z for z in x])[1]
```

You can also usually work around it with map/other functional stuff, but it would be so much nicer if I could just do it with a walrus operator.

2024-07-02

It's frustrating that you can't use the walrus operator in list comprehensions where you can in the unrolled loop.

Not that I would do it often in real code, it's just annoying for when I want to golf.

#python

a python loop and list comprehension
2024-06-06

(Another) another day, another Japanese karaoke Korean keyboard variant. Just how many of these are there?

This is like the third variant I've seen at the same karaoke chain. Otherwise the UI has been mostly the same across different locations.

#korean #keyboard

A Korean keyboard.
marco boosted:
2024-06-04

There has been a remarkable breakthrough towards the Riemann hypothesis (though still very far from fully resolving this conjecture) by Guth and Maynard making the first substantial improvement to a classical 1940 bound of Ingham regarding the zeroes of the Riemann zeta function (and more generally, controlling the large values of various Dirichlet series): arxiv.org/abs/2405.20552

Let 𝑁(σ,𝑇) denote the number of zeroes of the Riemann zeta function with real part at least σ and imaginary part at most 𝑇 in magnitude. The Riemann hypothesis tells us that 𝑁(σ,𝑇) vanishes for any σ>1/2. We of course can't prove this unconditionally. But as the next best thing, we can prove zero density estimates, which are non-trivial upper bounds on 𝑁(σ,𝑇). It turns out that the value σ=3/4 is a key value. In 1940, Ingham obtained the bound \(N(3/4,T) \ll T^{3/5+o(1)}\). Over the next eighty years, the only improvement to this bound has been small refinements to the 𝑜(1) error. This has limited us from doing many things in analytic number theory: for instance, to get a good prime number theorem in almost all short intervals of the form \((x,x+x^\theta)\), we have long been limited to the range \(\theta>1/6\), with the main obstacle being the lack of improvement to the Ingham bound. (1/3)

2024-05-16

I can't be stopped

2024-05-15

I spent a bit of time last night and this morning over-optimizing a naive #Python #LeetCode solution to get the fastest solution on the site.

Enjoy: theoreticallygoodwithcomputers

2024-03-25

Starting in 5 minutes!

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst