Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
55 GiB/s FizzBuzz (2021) (codegolf.stackexchange.com)
539 points by tentacleuno on July 3, 2023 | hide | past | favorite | 218 comments


The most impressive thing to me is that Linux allows data piped from one program to another to stay entirely in L2 cache and not hit main memory.

To me, that is amazing systems architectural design - so many different parts of a regular linux kernel all working together to let this fastpath happen.

Would such a thing be possible on Mac OSX's Mach ports or Windows's Named Pipes?


If a CPU's caches are physically tagged and both process's page tables share the same backing physical page then the CPU will let the cache contents be used by either process as long as the OS doesn't explicitly flush/invalidate cache on context switch.


The major cost of two processes coordinating shared data over two threads is TLB invalidation on a context switch.

I assume this is what the author meant by “page table contention” as a bottleneck: Without TLB invalidation there would be no need for either process/thread to ever touch page tables in a scenario like this.


It's unrelated to TLB; page table contention is related to associative caches where you can have two distinct addresses map to the same cache set and cause false cache miss or exhaust the cache set and cause repeated fill from memory.

TLB has to be flushed for process context switch on x86 regardless of Windows/Linux - but modern systems typically allow it to be selectively flushed (the system can elect to not flush tlb for the shared mappings that exist in both processes).

TLB miss vs Cache miss are an order of magnitude apart, and multiple orders if the corresponding pte itself is resident in dcache.


TLB has to be flushed for process context switch on x86 regardless of Windows/Linux - but modern systems typically allow it to be selectively flushed (the system can elect to not flush tlb for the shared mappings that exist in both processes).

What’s the mechanism for this? AFAIK one of the main motivations for the recent heated discussions on PostgreSQL adopting a threaded model would be eliminating TLB flushes in high-context-switch environments. Can Linux already preserve their (massive) shared mappings?


ASID (Address Space ID) or as Intel calls it PCID (Process Context ID).

The OS juggles these based on what processes are resident on CPU and the various active mappings then uses INVPCID[0] during a context switch.

[0]: https://www.felixcloutier.com/x86/invpcid


This is one of the many reasons that the entire HFT world runs on Linux.


  if(((i%3)||(i%5))== 0)
     buy_0_day_to_expiry_options()


I may backtest this over the weekend just in case it has some alpha :)


But 0dte options on SPX are available for each day of the week now!

    if (is_trading_day())
        buy_0_day_to_expiry_options()


I've heard of some windows HFT systems, tho I'm sure that's suboptimal.


Do they also use assembler in HFT?


Custom hardware, custom drivers, custom UDP stack.

It's not just the algo's, it's very much the latency.

( eg: https://www.velvetech.com/blog/fpga-in-high-frequency-tradin... )


For speed, its mostly FPGAs. You can put them directly at the demarc between you and the exchange.


They use custom ASICs too.


Besides living up to the username of "ais523 - high effort answers", the author also does some high effort comments with someone who can't get the program to run. The resolution:

> I suspect that what's happening is that the program was somehow compiled with ASLR turned on. For some reason, the dynamic linker doesn't respect the 4 MiB alignment of the BSS segment in this case, effectively ignoring my .align, and that's what's causing the bugs.


All I know is if this guy applies to the same job as me I won't stand a chance. The ultimate leet coder


Are you sure they didnt spend a lot of time to get the answer, then posted the question with a sock puppet account?


Every time this is reposted I giggle at this comment:

> @chx: I already have a master's thesis. This was harder. – ais523 - high effort answers Oct 29, 2021 at 1:17


I wrote simple implementations (simple if/else/while and printing to stdout) with no optimizations in Rust, Python3 and C

Rust -> 23.2MiB/s

Python3 -> 28.6MiB/s

C -> 238MiB/s

Does anyone know why Rust's performance is in the same ballpark as Python3.

I thought it would be more closer to C.


Rust’s print function locks by default (because of safety), C doesn’t. For more info see the Rust documentation: https://doc.rust-lang.org/std/macro.print.html

In order to get similar performance as C, you probably need to take care of this lock yourself:

    let mut lock = stdout().lock();
    write!(lock, "hello world").unwrap();
(And also you need to make the buffering size for stdout match C’s.)


> Rust’s print function locks by default (because of safety), C doesn’t.

Huh? Traditionally, stdio implementations have placed locks around all I/O[1] when introducing threads—thus functions such as fputc_unlocked to claw back at least some of the performance when the stock bulk functions don’t suffice—and the current ISO C standard even requires it (N3096 7.23.2p8):

> All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.

The Microsoft C runtime used to have a statically linked non-threaded version with no locks, but it no longer does. (I’ve always assumed that linking -lpthread as required on some Unices was also intended to override some of the -lc code with thread-safe versions, but I’m not sure; in any case this doesn’t play well with dynamic linking, and Glibc doesn’t do it that way.)

[1] e.g. see https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofpu...


Thanks for the correction, I wasn't aware that the latest C11 standard made these functions thread-safe in the spec. (And as you've said implementations like glibc already have these locks)


Take a look at the actual implementation on stackexchange, the slower impl is already doing the locking itself.


I wrote this a long time ago, you might find it useful.

https://ismailmaj.github.io/tinkering-with-fizz-buzz-and-con...


Neat tricks. Beyond BufWriter (which I'm already using) and multthreading, I'm guessing there's not much to be done to improve my "frece" (a simple CLI frecency-indexed database) tool's performance without making it overly complicated. https://github.com/YodaEmbedding/frece/blob/master/src/main....


Thanks for writing this, led me to a rabbit hole.


C and Python have adaptive buffering for stdout: if the output is a terminal they flush on newlines, otherwise they only flush when their internal buffer is full.

Here's a C program counting, with a 1ms delay between lines. The second column is a duration since the previous read():

   $ ./out | rtss
   4.7ms    4.7ms | 1
   4.7ms          | 2
   4.7ms          | 3
   4.7ms          | 4
   4.8ms    exit status: 0
You can see they were all written in one go. When allocated a terminal, they come out line by line:

   $ rtss --pty ./out
   0.8ms    0.8ms | 1
   1.9ms    1.1ms | 2
   3.0ms    1.1ms | 3
   4.1ms    1.1ms | 4
   4.3ms    exit status: 0
Rust lacks this adaptive behaviour for output, and will always produce the second result, terminal or not.

Technically it unconditionally wraps stdout in a LineWriter (https://doc.rust-lang.org/std/io/struct.LineWriter.html), which always flushes if it sees a write containing a newline. To maximise throughput you therefore want to batch writes of multiple lines together, for example by wrapping it in a BufWriter.


You should compile rust with --release and C with -O3


That wouldn't be a fair comparison. Rust has an opt-level option for each build profile. It defaults to 2 for the release profile.


In practice O2 and O3 are rarely very different.


Almost certainly the limitation is due to printing, likely buffering or locking.


Can we see your code?


Makes you wonder... How fast would everything be if it was written in assembly.

In audio dev it's very common for dsp code to be written in assembly.


Assembly is hardly the reason this is fast. It is necessary to this solution but by no means sufficient.

Extreme algorithmic research combined with a high LOK of Linux syscalls and platform specific optimizations is what allows this to exist. To quote the author, Alex Smith, himself:

> @chx: I already have a master's thesis. This was harder.

This is in a different universe than what can be produced by simply "do it in assembly".


The second ranked solution (by me :)) shows that a more or less trivial assembly tight loop can get about 70% of the speed. The remaining 30% is... something else.


What's LOK in this context? Search and Wikipedia didn't throw up anything. Thanks.


Level of knowledge, I think.


Correct.

It's one of those extremely-common-in-the-US-military-almost-no-usage-outside-it type things apparently, like behoove or tack (hyphen) or diggit (multi-tool), which I'll keep in mind in the future.


"Extreme algorithmic research combined with a high LOK of Linux syscalls and platform specific optimizations is what allows this to exist."

No, author's extreme boredom and/or free time allowed this to exist. Nothing else.


Even if everything was just written in java we’d be better off than the current system of embedding chrome into a python instance and then running a webserver in javascript to render a document.


at $DAYJOB we're in the (slow) process of replacing a tool written in Java with its successor tool, which is a web app. the Java tool works great and has reasonably snappy performance. the web app is terribly slow (at least 0.75 jiras), frequently hangs, and is often unusable for simple tasks. it's been miserable enough to make me miss the Java Era.


I will start measuring things in jiras, thanks :)

Also I bet the reason is more of architecture in your web app. It's harder to make it as fast, for sure, but it shouldn't hang or be unusable.


I wish we could do anything about it, but it's a third-party SaaS (like everything else these days...)


I don't think replacing one bad idea with another bad idea is a good idea ;) I'm also not sure if Java would be less bad in that case. At least JS has years of research that went into making it start fast, not only run fast after warmup.


Is this "years of research" the reason why most Electron apps still start slower than - for example - IntelliJ IDEA, which is written in Java and Swing?


Modern Java has Coordinated Restore at Checkpoint (CRaC) as well as ways to compile it to native binaries that can start very fast (GraalVM & Substrate VM). The former has the benefit of a JIT as well!


Have you used any of these? I’m looking for ways to speed up start time of JVM, but almost options I’ve seen so far are “experimental” and not sure best route to take. For example was sad to see ahead of time compile being removed from JDK.


Just imagine if those years of research went into a good language instead. Instead of expending a huge amount of effort making an inherently slow language perform reasonably, we could have taken a reasonable language like Rust and made it so good it would legitimately push computing forward.

And then all you web devs could still compile rust to wasm. wasm is already faster than JS anyway. JS was a mistake.


If that were true, then eclipse would be faster than VS Code, right?


It would end up like Steve Yegge's tale of Geoworks:

"OK: I went to the University of Washington and [then] I got hired by this company called Geoworks, doing assembly-language programming, and I did it for five years. To us, the Geoworkers, we wrote a whole operating system, the libraries, drivers, apps, you know: a desktop operating system in assembly. 8086 assembly! It wasn't even good assembly! We had four registers! [Plus the] si [register] if you counted, you know, if you counted 386, right? It was horrible.

I mean, actually we kind of liked it. It was Object-Oriented Assembly. It's amazing what you can talk yourself into liking, which is the real irony of all this. And to us, C++ was the ultimate in Roman decadence. I mean, it was equivalent to going and vomiting so you could eat more. They had IF! We had jump CX zero! Right? They had "Objects". Well we did too, but I mean they had syntax for it, right? I mean it was all just such weeniness. And we knew that we could outperform any compiler out there because at the time, we could!

So what happened? Well, they went bankrupt. Why? Now I'm probably disagreeing – I know for a fact that I'm disagreeing with every Geoworker out there. I'm the only one that holds this belief. But it's because we wrote fifteen million lines of 8086 assembly language. We had really good tools, world class tools: trust me, you need 'em. But at some point, man...

The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?

This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.

Because we couldn't see how the system worked anymore!

Small systems are not only easier to optimize, they're possible to optimize. And I mean globally optimize."

http://steve-yegge.blogspot.com/2008/05/dynamic-languages-st...


>Object-Oriented Assembly

what in the god damn


Haha! This is a real thing. “It is easy to assume that OOP requires such an OOP language. This is not the case.” https://zsmith.co/OOA.php

https://www.amazon.com/Object-Oriented-Assembly-Language-Len...

Indeed, my first industry job, after spending my undergraduate and graduate degrees honing C++ skills, was doing object oriented C. It was actually great, taught me a lot about what actually matters in a language versus what people will tell you.


Paradigms are not a characteristic of programming languages, they are a characteristic of programs.

Now, obviously, programming languages encourage and afford the use of certain paradigms. But that's all they do. A program is not object oriented because it's written in C++. It is object oriented because it is written in terms of objects, regardless of what the underlying language may support or encourage.

Assembly, being the base language of the CPU, can be written in any paradigm any other language might permit, since all other programming language paradigms are obtained as limitations on the underlying assembler that may be generated. You can always just apply those limitations manually to get an object oriented program, or a logic programming program, or a functional programming. It just might be very costly and difficult.


vtables aren't limited to any particular language

That you have some set of macros that work on common data structures and use a function pointer lookup to dispatch some behavior isn't insane. I mean, if your macros are that advanced why not use a real programming language, but shit got weird in the 80s and early 90s.


Interestingly, AI might change this. Not that it would be a good idea to write everything in assembly, but at least it would be possible now.

Some wonderful systems were written in assembly. Donkey Kong comes to mind.


> Interestingly, AI might change this. Not that it would be a good idea to write everything in assembly, but at least it would be possible now.

More likely to be the opposite IMO. AI could make the Sufficiently Smart Compiler a reality; maybe autovectorization will finally work and you could write FizzBuzz in Python and it would perform like this.


Assembly is so far from the original code written today that it is not even feasible to think about. That being said, imagine how fast everything would be if the companies developing the software would actually care about performance.

For like 99% of websites and software today, if anyone cared about the app performance, I am pretty sure most would be able to achieve at least 50% speed-ups through very basic changes (correct caching, optimizing assets, replacing bloated 3rd party libraries with a basic native call that does the same thing, configuring the servers and databases properly, etc.).

EDIT: That being said, I am pretty sure in a few years AI would be able to provide one-click optimizations to a repository that would either apply best-practices or rewrite the original code in performant Assembly.


> "if anyone cared about the app performance, I am pretty sure most would be able to achieve at least 50% speed-ups through very basic changes"

See https://danluu.com/octopress-speedup/

> "This blog is a static Octopress site, hosted on GitHub Pages. Static sites are supposed to be fast, and GitHub Pages uses Fastly, which is supposed to be fast, so everything should be fast, right?"

followed by

> "I'm not sure what to think about all this. On the one hand, I'm happy that I was able to get a 25x-50x speedup on my site. On the other hand, I associate speedups of that magnitude with porting plain Ruby code to optimized C++, optimized C++ to a GPU, or GPU to quick-and-dirty exploratory ASIC. How is it possible that someone with zero knowledge of web development can get that kind of speedup by watching one presentation and then futzing around for 25 minutes? I was hoping to maybe find 100ms of slack, but it turns out there's not just 100ms, or even 1000ms, but 10000ms of slack in a Octopress setup. According to a study I've seen, going from 1000ms to 3000ms costs you 20% of your readers and 50% of your click-throughs. I haven't seen a study that looks at going from 400ms to 10900ms because the idea that a website would be that slow is so absurd that people don't even look into the possibility. But many websites are that slow!"


> For like 99% of websites and software today, if anyone cared about the app performance, I am pretty sure most would be able to achieve at least 50% speed-ups through very basic changes (correct caching, optimizing assets, replacing bloated 3rd party libraries with a basic native call that does the same thing, configuring the servers and databases properly, etc.).

I'd say a big factor to include here is the choice of language. Going from Python/Ruby to Kotlin/Rust/etc could probably yield a speed up of over 10x / over 1000%.


I think the ecosystem is more important than the programming language. You can choose any language, if you bloat your application with 20MB JPEGs, 3rd party API requests that take tens of seconds or implement animations that take seconds to complete, the language won't matter that much.

I am not sure if there are languages that come with a fool-proof ecosystem when it comes to implementing things and making them fast. The only one I can think of are UI-based applications builders, that only allow you to add to your app only a limited subset of UI elements or features.


I think an oft overlooked detail of more primitive languages yielding faster programs, is this:

when it’s painful to progress even a little bit while coding, you try very hard to implement as little as possible

resource constraint can give clarity of focus


Shortest/smallest code is rarely the fastest. Loop unrolling and inlining make code larger, and occasionally faster as an trivial example


Sure, but my comment meant to address systems thinking and application design, not micro optimisations.


I started in software dev in 1988; the only languages I knew were BASIC and Z80 assembly so, since everyone knew BASIC was kind of lame I naturally assumed I'd be a machine code programmer, albeit on slightly more exotic processors. Day one got handed a book on C -> mind blown.


Or if everything had been written by "ais523 - high effort answers"


You dont need Assembly, you need to the code not to be terrible, which is generally hard with the amount of code/developers needed.


It depends. You can write bubblesort in assembly and it will be pretty damn slow. I can imagine those assembly leaders could push out pretty fast C implementations as well.


It wouldn't run at all since it wouldn't exist.


This exercise appears to be somewhat flawed, even if entertaining/informative. Instead of evaluating the speed at which complex problems are resolved, it predominantly tests a peripheral issue: the efficiency of extracting memory from one process and transferring it to another. This allows for the illusion that the second process continues to write to a console/file, even though, technically, it does not - executing pv >/dev/null is essentially a no-op, as the write system call returns almost instantly.

vmsplice grants access to a process' buffer/memory to another process - a shared mem equivalent. As the initial competition requirements are likely vague, I'd imagine it's unclear if this is still good wrt the rules.


> As the initial competition requirements are likely vague, I'd imagine it's unclear if this is still good wrt the rules.

You can scroll upward to the original question to see the initial requirements, and check the edit history to verify that they haven't changed since the start of the challenge:

> Write your fizz buzz program. Run it. Pipe the output through <your_program> | pv > /dev/null. The higher the throughput, the better you did.

> The program output must be exactly valid fizzbuzz. No playing tricks such as writing null bytes in between the valid output - null bytes that don't show up in the console but do count towards pv throughput.

And vmsplice(2) indeed produces a stream of bytes in the standard output pipe that pv(1) can splice into /dev/null, or cat(1) can copy into the terminal.

This submission was not the only one that uses vmsplice(2); others have found that it's far from a magic bullet. Once you pass the I/O hurdle, much work remains in generating the pages of output as quickly as possible.


> it predominantly tests a peripheral issue: the efficiency of extracting memory form one process and transferring it to another.

Isn’t this almost always the whole problem? Most code is bottlenecked on memory and I/O. Complex problems are usually held up by the speed of getting data from one place to another, and not very often on computing the data. As someone who spends his days optimizing GPU assembly, even in the rare cases when compute is the bottleneck, once you optimize it, memory becomes the bottleneck.


Maybe it should be called `code-origami`


I like that ;)


I disagree, because one does not get to the bottleneck being _the efficiency of extracting memory from one process and transferring it to another_ without major fizzbuzz-specific optimizations first.

For example, there's a clever bit representation to get base-10 carries to happen natively.

The initial competition requirements are not particularly vague about this point: Measuring throughput with `<program> | pv > /dev/null` is prescribed, and it also says

> Architecture specific optimizations / assembly is also allowed. This is not a real contest - I just want to see how people push fizz buzz to its limit - even if it only works in special circumstances/platforms.


I/O is something that literally every program has to do. Its also the bottleneck of 99% of code running on modern hardware. Moving bytes from one place to another is essential and relatively slow.

Understanding how to deal with memory I/O and file I/O performantly is a relevant skill for every program and programmer.


> Save it as fizzbuzz.S (that's a capital S as the extension)

What’s the significane of “.S” vs “.s”?


Capital S will run the pre processor first.

Edit:

From manpage:

    file.s
        Assembler code.
    file.S
    file.sx
        Assembler code that must be preprocessed.


IIRC, I believe that the difference traditionally was whether to pass the input through the preprocessor (.S), or not (.s).

Not sure if it makes a difference on modern toolchains.


The convention I'm used to uses .S to denote hand-written assembly files (usually tracked in git) vs .s for machine generated assembly which can be overwritten as needed.


GCC et al won't overwrite a .S but if you ask it to gen ASM (e.g. gcc -S xyz.c) it will overwrite a .s



For a second I thought it said 55 GiB/s FritzBox which is a popular router in the German speaking part of Europe. My ISP also just last week tweeted a new 60 GiB/s capable OPNSense box[1] that will be available soon.

[1] https://twitter.com/init7/status/1674920410889043973


I’ve a DEC750 that I’ve bumped to 16GB memory. A 10GbE capable (2.3GbE Wireguard) router that’s silent and idles on par with my cable modem (8W) has been a great piece of equipment I don’t see myself replacing until 2030-2035. I even plugged in a USB WiFi adapter so if cable goes out, it fails over specific VLANs and will use my cell phone to keep core connectivity up for work, etc.

They’re not cheap, but if you want some serious kit that also financially supports the OPNSense project, the Deciso appliances are tough to argue. Power efficient, durability-focused components, just work.

It’s nice seeing their commercial offerings getting beefier.


Interviewer: So let's start with a simple coding challenge, it's called FizzBuzz. Have you heard of it?


Interesting how Java is so fast. Among the fastest of c, cpp, asm, go and rust.

How?


Because the JVM has had 27 years of R&D work done to it. If you have a really hot path a JIT will typically continually recompile it in the background with more and more optimization work and start putting the loop through that progressively more optimized code. When you're doing gigabytes of FizzBuzz the JIT will quickly have something pretty comparable to the best native code.


oddly enough fizzbuzz stuff likely needs an on the stack code replacement, i.e. compiling and replacing the currently executed code, which tends to be subpar.

The submitted code snippets for Java do not need OSR, though... yet they can be improved further, e.g. should drop the use of String entirely (which would not feel very Java). The other attempt to convert int -> String (byte) uses the naive way to divide by 10 on each iteration, Java.'s Integer.toString does it way better.

Edit: on a 2nd thought, having a dedicated direct buffer [same allocation, different slices] per all the 8 out of 15 ares for numbers and NOT converting int->String each operation but adding 15 would be a pretty boon as most of the time the change would be only the last 2 bytes, and there won't be any 'div' to be had. Div is generally slow (compared to L1/L2 cache misses, and L3 hit), there is not algorithm to parallelize it, and there is one (few) unit that can perform div, unlike 'add')


It doesn't matter for FizzBuzz, but in general JIT should really out-perform "native" statically compiled compilers on most real-world workloads. While there's a small penalty for the profiling JIT needs, there are massive gains for data and access dependent optimizations that static compilation can't achieve without explicit hints. For example, a JIT can observe the most commonly accessed data and re-order it to optimize for the pre-fetching pipeline. Or it can re-order operations to optimize for cache-line hotness.


> but in general JIT should really out-perform "native" statically compiled compilers on most real-world workloads.

In theory, yes, but in practice I've never seen it happen. Best I ever saw was matching C speed at toy benchmarks. Even in this benchmark here, Java is decent, but does not beat even the naive implementations in C/Rust.

Also, AOT compilers can do PGO as well, so they can use the same techniques. But they also have way more time and resources, so they can do things like whole program optimization, which is something JITs cannot do because they have much smaller computation and memory budget.

It happened already many times to me that the first naive version of a C/C++/Rust program/function I wrote was already faster than a carefully tuned Java equivalent. AOT compilers for "fast languages" got really good these days. The design of language also influences how well it can be optimized by the compiler. E.g. it might look impressive JVM can devirtualize dynamic calls at runtime, but C++/Rust often don't have to do this at all as programs in those languages tend to have very few virtual calls if any at all.


If you didn't need more evidence, the fact that the Java community has been slowly moving towards AoT with GraalVM should cement it. Java's JITs are damn impressive pieces of engineering, but the structural advantages of traditional compilation still win out surprisingly often.


The move towards AOT is mostly for lower startup times and lower memory usage, not to make the application faster in general.


>Java is decent, but does not beat even the naive implementations in C/Rust.

Java doesn't beat C in this benchmark but beats Rust with ease.


There is not enough data to conclude that. There are 3 Java implementations slower than the faster Rust one, and there are 2 Java implementations faster. But there are only 2 data points for Rust in total.

Also, technically there is nothing C can do that Rust can't.


Will V8 be able to do the same some day?


Java is a lot better source (and target) to optimize for than JS. It's quite bit more work to have all numbers being floating point by default.


My understanding is that a statically typed language is inherently faster than a dynamically typed one, no matter how much you optimise the latter. The JS interpreter/VM simply has a lot more work to do than the JVM.


No. Well…it’s complicated.


I think WASM will be the mainstream for frontend before that happens... But you never can't predict the future!


Java is only slow to start the VM. If your process is long-lived it's exceedingly quick because you only pay the startup penalty once.

"GC pauses" are greatly exaggerated in terms of impact and frankly for the vast majority of uses cases GC simply doesn't become an issue.

The JVM is really really good because at one point or another they had basically every luminary in the field working on it.


> Java is only slow to start the VM

And it's not even that slow anymore compared to, say, starting the JVM in 2010.

Some numbers for those curious...

On modern hardware launching the JVM to run a program immediately exiting takes less than 100 ms (so does starting Emacs complete with its GUI and running some elisp code exiting Emacs: 80 ms on my Ryzen 7000 series including reading from the M.2 NVMe PCIe 4.0 x4 SSD).

It's once you start loading lots of classes that JVM startup time can be slow.

One example would be a Clojure program doing nothing besides exiting: thousands of Java classes being used and you get into 1.2 seconds territory to do nothing. 12x slower than a Java program doing nothing.

As a sidenote for both Java and Clojure there are now ways to reduce startup time, like using GraalVM (which, for example, Babashka, a natively compiled Clojure interpreter, making Clojure startup so fast it can be used for scripts).

> The JVM is really really good because at one point or another they had basically every luminary in the field working on it.

I agree. The JVM is an impressive piece of machinery and it'll even give you, say, an AIOOBE (ArrayIndexOutOfBoungsException) instead of an exploit if you fuck up.


>so does starting Emacs complete with its GUI

I'm with you all the way and am a daily emacs user but I'm not sure I'd point to emacs as a good performance comparison, haha.

Emacs has always been a bit of a dog in my experience.


The fear of GC pauses is mostly outdated. It was right in the past, but it is becoming less and less of an issue with the new GC work coming online. And for small heaps in web application servers, the pauses were a blip on the high percentiles of response times and you usually had a dozen other issues before that.

It's similar to SWAP in Linux. It was implemented somewhat meh in older kernels, and usually if swap got hit, the system died anyhow. So there was no real difference between random processes being OOM-killed or the system grinding to a halt swapping. Modern kernels in the 4+ line have received quite a bit of work on the swap handling and swap is used a lot and very cleverly to eek out just a bit more available memory more quickly.

Old habits die hard though and it takes time for old knowledge to change.


The problem of GC pauses is solved only if you can afford to waste 5x-20x more memory than the app is really using. Otherwise those low-pause modern collectors can burn a significant number of CPU cycles or even fall into emergency STW path (when they can't keep up with garbage production rate).


If the allocations die trivially (the extreme/vast majority do) in the young gen, STW doesn't occur. Indeed being aware, how the GC operates is important to write a decent Java application. So is understanding how the hardware works in general.


> If the allocations die trivially (the extreme/vast majority do) in the young gen

Low pause GCs (ZGC, Shenadoah) for Java are not generational yet.

Also, even if they were, a very high temporary object allocation rate increases young gen GC frequency and thus increases the number of objects pushed to old gen eventually.

And it is not like those young gen GCs are free either. They burn quite some CPU time and they cause micro-pauses - the GC has to scan parts of the heap to learn which objects are reachable and then it has to copy the survivors. In practice, high allocation rate requires a decent amount of overhead RAM to make that process efficient. It doesn't matter if pauses are only 10 ms short if you do 50 of them per second. ;)

In non-GCed languages those trivial short-term objects are typically allocated on stack and their allocation/deallocation is trivial and doesn't pause at all.


>Low pause GCs (ZGC, Shenadoah) for Java are not generational yet.

They are. GenZGC is merged into mainline and got into the latest JDK, GenShenandoah didn't manage but will be in next release.

>The problem of GC pauses is solved only if you can afford to waste 5x-20x more memory than the app is really using

I don't think that is true. ZGC uses multi-mapping in order to dereference its colored pointers, this causes some tools to report excessive memory usage but nothing actually hits RAM.


>the GC has to scan parts of the heap

Hmm, for generation GCs, it should use Card marking, the pointers in the tenured gen should have means to be trivially determined where they belong to. With 64bit pointers, there is space for quite a lot of metadata, incl. the Class (or most commonly allocated/used classes).


Card marking does not identify pointers directly, but only identifies parts of the heap that may contain pointers to the new gen. So those parts still need to be scanned. Surely, it is way less work than scanning the whole heap, but still non-zero.


> it is becoming less and less of an issue with the new GC work coming online

I feel like I've been hearing this same line for 20 years now


Java has a ridiculously fast new operator. Much faster than the one in C++.

It works by allocating all or most needed memory at program start, instead of asking the operating system for it every time. But, as soon as you don't use heap memory, and use the stack, C++ is again much, much faster than Java.

It all depends on memory management.


C++'s new operator can also do this using "placement new" semantics[1], which usually means making (and using) a custom allocator. You're still using the normal new operator, just passing extra parameters to do the placement you want.

To your second point about raw speed, one of the extraordinary things we found in long-running processes performing computation using real-time marketdata at Goldman was that the HotSpot JVM was able to optimize java programs through the day due to their usage, so if you started them each day they would actually end up faster than the C++ versions at the end of the day even though they would start off slower. That's not due to memory allocation it's due to things like inlining of functions.

The implication of that is that if you very carefully inlined all the functions appropriately in the C++ version based on profiling actual usage you would be able to achieve the same result, but for the JVM it just happens automatically without you doing anything.

[1] Search https://en.cppreference.com/w/cpp/language/new for "Placement New"


> Java has a ridiculously fast new operator. Much faster than the one in C++.

Can you point us to some benchmarks?

I don't doubt it is faster, but I doubt the "ridiculously" part. Last time I measured it was only a tad faster than jemalloc (~20%) if you allowed the benchmark to run long enough for GC to start cleaning up. Unfortunately I haven't saved it (it was a very informal benchmark). I'm curios to see that esp. on modern low-pause GCs.


It's just pointer bumping (still). If you're continously using new memory then soft page faults and finding physical backing via kernel will dominate. mallocsen want to avoid this scenario, so you often see malloc using more memory than necessary.

Anyway, don't use new in C++ is my opinion.


It's pointer bumping in optimistic case. Eventually it runs out of current slab and... then there is some more work to do. And also those page faults are actually quite likely as it gives you memory that hasn't been used recently. Malloc tends to return often whatever was released last that matches the requested allocation size. It is indeed a tad more work to find it, but when it's found it's usually hot in the cache.


Imo it's one of Javas strengths. If you need to, you can write ugly but ridiculously fast code. It does end up looking more like writing C than java though, but obviously your whole codebase doesn't need to look like that and it's portable.


The only part Java lacks - headless objects (aka project Valhalla). It has been mimicked with direct buffers and offset for ~20y now (since Merlin release). Other than that java is similar to a guided "gcc -O2"


Sure, you can write Java like C and get performance typically within 0.3x to 1x of C. But Java written in this style is even less ergonomic than C is, so if you have to do do it, there is not much point in using Java.


It is quite common that you only need to optimize very small parts of a program to this level. The rest of the program can be written in more conventional styles.

You could of course FFI into e.g. C for those parts, but that is usually harder to maintain than a few well optimized java classes.


> It is quite common that you only need to optimize very small parts of a program to this level.

It's a quite common myth developers believe about performance. Hotspots do happen sometimes, but once they have been optimized you quicky end up with a flat profile and an "everything is slow" problem. And in some types of apps, the majority of code is performance critical.


It depends - even when you run into an "everything is slow" problem, it might be that it's like 1 endpoint out of 2000 that causes performance issues. In this case, you might need to focus very much on performance for that endpoint, but maybe not for other endpoints. Profilers can help you figure out what code to focus on.

If the majority of code is performance critical, the tradeoffs are of course different.


Nah, "everything is slow" is vanishingly rare in practice.


Depends on the application. In area of compilers, databases, CAD, game engines, simulation, distributed analytics, machine learning the only code that isn't on the critical path is some configuration / control plane / UI - which is minority of the code.


Certainly not in the entirety of those areas. I've worked in a couple of them and functionality was a higher priority than performance and we picked our tools accordingly.


The cases you’re talking about are the rare ones. Source: I did this professionally and now recreationally.


> You could of course FFI into e.g. C for those parts, but that is usually harder to maintain than a few well optimized java classes.

Hopefully Foreign Function & Memory API [0] makes FFI so much easier that we get to drop down to C without much fuss.

[0] https://openjdk.org/jeps/442


This inspired me to plumb the depths of FizzBuzz, seeking further into it than anyone ever has before: the 10^10000000000th digit (it's a "1"): https://github.com/leijurv/reverse-fizzbuzz


Ah yes, the hyper log log log algorithm


The first company to offer a native hardware implementation of FizzBuzz is going to revolutionize hiring and bring back boom times for our industry.


At Fogbeam Labs work has been proceeding to bring perfection to the crudely conceived idea of a FizzCabulator. This device not only supplies inverse reactive current for use in unilateral phase detractors, but is also capable of automatically synchronizing cardinal grammeters. The only new principle involved is that instead of power being generated by the relative motion of fizzes and buzzes, it is generated by the modial interaction of magneto-reluctance and capacitive duractance.


I love it. How about a quantum computing version?

You know who wrote this:

"Essentially, the Quantum FizzBuzz Decoupler is a device that bifurcates the bifizzial substrates and recoheres them into units of Buzzifactive tri-fusion. Its pivotal design focuses on the tripticate circuitry which utilizes pseudo-string theory, ensuring the cyclic decoupling of the Fizzionic and Buzzionic isotopes, thereby creating a conflux of cryptic poly-angular interdigitations.

This hyper-convoluted mechanism leverages the intrinsic infra-doodlality of its multitudinous nano-digitators, producing a reiterative bi-fractal resonance with every flicker of the reciprocal quark-flux nano-helices. Interlocking harmonic modulators within the Decoupler support the synchronous crystallization of the brizzulated waveforms, thus reinforcing the transluminal juxtapositioning of multi-fuzzional intermediates.

Within its dual-core, it harmonizes the phasic disentanglement of quasi-quintessential quantum flux through the divaricating spindles of the orthofizzial transinductor. Furthermore, the Decoupler incorporates an ultra-gloptic resonator to facilitate the recursive modulation of the bifizzial and tri-buzzoidal constituents, hence the stupendously high FizzBuzz output."


...GPT? If so, what prompt did you use?


https://chat.openai.com/share/19634ae8-0c49-4e3f-bc97-229aaa...

At the end of the prompt I tell it to judge its results against my requirements because I was afraid it would use real words, since it usually shouldn’t be making words up as part of its output - let alone making the whole output nonsense - so this is a tough task for it.

If you tell it to judge itself it is less likely to do the wrong thing. (It doesn’t like to end up printing “I disobeyed all the instructions since I used real words” - it’ll rework its output to be more correct. It’s like a cook, if you ask for a chicken pie and then to tell you if what it just cooked is a chicken pie it won’t bake you an apple pie instead. Unlike a real cook, it doesn’t have enough awareness to judge itself if you don’t ask it to.)

It still used the term isotopes. The original didn’t include any real words like that. So it didn’t really meet ALL of my requirements which said “don’t use any real technical words”. It should have changed it to isotrypes or something.

It is still a very impressive result. ChatGPT truly understands the prompt and can really think, as evidenced by its summary at the end, even though I didn’t mention that my request was for comedic effect.


He used the Retro Encabulator to generate this text.

https://youtube.com/watch?v=RXJKdh1KZ0w (my favorite version)

It's an old engineering in-joke.

Edit: oops, thought your comment was a reply to 'mindcrime. I'm keeping it here though for the lucky (very lucky in this case) 10,000.


I would love to see the out-takes from recording this, and see how many times the actor just broke out laughing his ass off, spewing all that gibberish!


You don't even need to spin your own silicon, just slap FizzBuzz on the $50k Xilinx XCVU13P and FizzBuzz at a whopping >500GB/s using all of its transceivers. Though, you'd probably need two of 'em (somebody has to receive all that FizzBuzz) so toss in another $50k for another, and them hmm you need some top tier signaling talent to manage all that, so probably add in $250k to get somebody to do this. A mere $350k for the FizzBuzz crown, it's a steal really.


If you're paying more than $300-500 for a XCVU13P, you're doing it wrong.


Is there a joke somewhere there? Because I don't see how this chip can be found at $300-500.


Here is one on ebay for $600, realistically you'd need a nice rework station and the skills to reball it to actually use it. FPGA pricing for on places like mouser and digikey is notorious for being an outright lie, the high volume prices are way less then 50k (don't get me wrong they're still expensive chips, just less expensive than that).

https://www.ebay.com/itm/364213117135?hash=item54cccb20cf:g:...


The China secondhand market is teeming with them.


That's trivial; a shift register that latches the 15 initial values, and then just rotates them around, reading off the last one as the output.


This realization almost got me suspended in a high school. Our goal was to make an extensible fizzbuzz(barbazzfrongetc)er and I got fascinated by the problem of making wheels. I wrote a program the generated circular lists in mit scheme that could be used as wheels (ie circular lists with either #f or fizz, buzz, bar, fizzbuzz etc). It was structured as a competition (fastest fizzbuzz up to int16 where the fizzes and buzzes would be given by the teacher at competition time), and the winner would get a chocolate bar or something inane. I couldn't eat it because I can't eat milk.

I was the only one who spent some serious time on the task, and in the end my implementation was the fastest by several orders of magnitude, despite being the only using something else than C, C++ or java. The programming teacher refused to believe I wrote it myself, but due to a stroke of luck I have saved every revision of the code as a part of a primitive folder based SCM-scheme.

I ended up getting a louse grade due to me not caring too much about school, but the teacher and I were on good terms and he became somewhat of a programming mentor to me


Ah, but for 8 of the values, you need to print the number, which changes each iteration.


BCD increment is trivial in hardware.


pretty much that sums it up -> no division, please


You'll maybe be pleased to know that I've already had this as a coding interview at a couple of high frequency trading FPGA interviews. I didn't do well at all in the first interview. Second time I got asked it I laughed, because I had spent a weekend down a rabbit hole, programming one on an old Cyclone V. Basically just a load of counters and state-machines. It's something like this: http://www.righto.com/2018/03/implementing-fizzbuzz-on-fpga....


At the very least their hiring question will be relevant.


Harnessing power of the sun we can build a planet scale computer to bring fizzbuzz to 50gigaterrabits bee second


Meanwhile 99% of developers claim that writing desktop apps with Electron is perfectly fine, speed of the language and runtime doesn't matter since you are going to wait for I/O anyway and waiting a few seconds after launching an app to be able to use it, is perfectly fine even on most powerful hardware.

99% of managers claim that developers cost more than hardware, so it's perfectly fine if the code is slow, they can just buy more hardware.


And they're right. 55 GiB/s FizzBuzz is cool, but it's no more useful than a normal-speed version, and if someone was paying you for your time then spending months implementing FizzBuzz would be outright irresponsible.


Of course this is just a toy example. But suppose you had a different task that is I/O bound, like processing terabytes of jsonlines files.

The argument is rather the following: common knowledge is that it’s not worth optimizing the code and that programming languages don’t matter, because the IO is too slow anyway. This fizzbuzz shows just how bad we are at IO compared to the optimum. So if we were to improve the IO, then faster processing would also make a difference. In this case you get an improvement of 179x compared to Python, i.e. your laptop will do instead of multiple clusters in the cloud.


> Of course this is just a toy example. But suppose you had a different task that is I/O bound, like processing terabytes of jsonlines files.

If this task was the bottleneck in a large scale system then it would definitely get hand optimized after a proper analysis.

But if this is an occasionally run task or something otherwise not business critical that doesn’t bottleneck anything, spending orders of magnitude more time hyper-optimizing it would be a waste of time and money.

Match the solution to the job. Optimizing everything is one of the age-old mistakes in computer science.


You still need the skills to do it.


> The argument is rather the following: common knowledge is that it’s not worth optimizing the code and that programming languages don’t matter, because the IO is too slow anyway. This fizzbuzz shows just how bad we are at IO compared to the optimum. So if we were to improve the IO, then faster processing would also make a difference.

Maybe? But given that we don't have a way to magically make IO go much faster, that seems pretty irrelevant to the real world. The processor can go faster than memory; sure, we knew that.

> In this case you get an improvement of 179x compared to Python, i.e. your laptop will do instead of multiple clusters in the cloud.

So a highly skilled programmer devoting months to what's typically a 10-minute interview problem (i.e. about 5000x as much programmer time - and the time investment would probably scale superlinearly with the complexity of the problem) was able to get... a measly two order of magnitude improvement. That's really not a good tradeoff most of the time.


This isn't a very strong generalization.

* Where do those terabytes of json live? In the example case, the data is not read from anywhere, but rather generated and written to the pipe. All else being equal, I suspect it would be hard to feed such a program at 55GB/s w/ real data from some storage somewhere - in the end you'd be limited by the disk read speed (or network recv speed).

* This example doesn't need much ram for processing. There's only a handful of variables that can be held nicely in registers. Even if you could get 55GB/s of input, there aren't a lot of cases of json processing that won't need to store variables in memory. At a minimum a lot of effort would be required to ensure cache friendliness for these variables to not thrash the cache - if it's possible at all.

* The code for the example is tiny - json parsing requires a lot more instructions. Even in the simplest case - pulling a fixed number of bytes from fixed size records in the input stream requires a similar number of adds and compares (etc) as the example case, and the output memory alignment machinery will be about the same. As the code grows to handle actual parsing of json and processing of data, the instruction count grows quickly to a size where the instruction cache may become a concern.

* speaking of instructions - parsing json has a lot of conditionals. I don't think it's possible to do it without any conditionals, or even without enough branch mispredictions and pipeline flushes to achieve such high throughput.

Of course all of this is assuming that the IO of writing to a pipe is the same as IO in the "get from network or disk etc" sense. They really aren't the same - io to a pipe is IPC, the data never leaves the l2 cache - it's only io to the process not to the CPU. Once you have disks and networks and RAM involved, things change dramatically - the signals have to travel an order of magnitude (or more) further, they have to leave the chip so you must synchronize with a different device, there's protocol overhead for storage and network, etc.

Point being - sure we can absolutely do better with software in a lot of cases, but this is not a good example to derive conclusions from.


Is your point that we should prioritize app performance and spending more on larger teams of develops, over the ease of quickly writing apps in popular frameworks that work on any system and the web?

If so, that's missing half the picture. "People don't care about performance" isn't why devs gravitate towards Electron. Devs and managers gravitate towards Electron because it means they don't have to hire extra people and do more work to get their native apps working on whatever operating systems they ship to and rewrite the whole app for the web. Yeah, it'd be nice not to have the performance penalty, but some teams have accurately calculated that the performance loss the customer realistically notices is worth the benefits of writing and maintaining the app.


I feel like a lot of devs take things like YAGNI and “move fast and break” things a bit too seriously and to great extreme.

They’re too willing to accept an adequate solution because it “works” rather than spending just a little bit more time to find a more appropriate solution.


If it works (or even “works”) for the users and for the business, it’s probably appropriate.


That's a slippery one. Spreadsheets in Excel works for the business.

That doesn't mean it's appropriate to replace your billing system with one. Even if it works.

It's probably not useful to reason about a whole field in such absolute terms as what's best or what works.


Yet many small business billing systems are basically an excel sheet with a word doc or manual online invoice system like square invoices. You wouldn't replace a functioning billing system with this one, but you might not replace this one with an expensive custom system. As programmers, we often discount the value of our time to zero when we're writing software because it's kind of fun. But for a non programmer, 10-30 minutes of work each week invoicing isn't an issue if the system never gets into an unusable state.


Excel works for the business, but see how much money is estimatedly lost yearly due to excel mistakes.


Excel is the most widely used no-code/low-code environment by a wide margin. It's not surprising that something in such widespread use has a lot of bugs in aggregate. It's not the typical office automation written in <pick your favorite language> is bug-free either.


> I already have a master's thesis. This was harder. – ais523

Expecting teams to do more than a master's for every desktop app?

You're not going to have businesses if you apply the standard of this post to everything. Respectfully, the attitude of "optimize the world and damn the consequences" feels like it usually comes from people who haven't tried to deal with the realities of making money with code, even at a level of just understanding why your manager is telling you that they're pushing the release forward against your suggestion.


> Expecting teams to do more than a master's for every desktop app?

If that were possible, I don't think it would be unreasonable. A Master's degree is two-ish years, so about two person-years of effort. Once you're looking at even a small team, a desktop app would often have far more effort put into it.

Instead of 'a desktop app', I think the better comparison is to note that this is a FizzBuzz program. By design, it doesn't do any useful or algorithmically complicated work. It's a canonical example of a trivial program.

If you imagine putting Master's thesis levels of effort into every trivial part of a desktop application, then you'd start to outline the full scope of the problem. Rather than a "Master's thesis" effort, you'd be looking at something like a full, crash research program like the Apollo project.


Yes, and the reason you are waiting on I/O is because you try to get everything from the internet instead of just computing it locally.

The internet is actually faster than you think if you minimise spurious network requests, but most applications don't do that, leading to seconds of delays.


I do not know how many developers use VS code, but all of them are using electron and it seems to be fast enough for them.

I agree with the spirit of what you imply: what a waste. On the other hand we do not use titanium everywhere, only where needed. There is best for a specific usecase.


I noticed people have very low expectations about modern application performance. My mom finds it completely reasonable that her TV or car audio need 5-10 seconds to boot. I guess that because almost everything is a web-page now (including web-pages wrapped in a window like electron apps), people simply got used to seeing a lag every time they click.

And I remember how my jaw has almost dropped when in 2022 I saw an operator of a POS that used a 80x25 TUI coming from a dinosaur era coded probably with Borland TurboVision... Man, how fast that was! Some windows literally appeared on the screen for a fraction of a second (the operator obviously memorized the sequence of actions). The fact that I found it unusual, when in fact it should be a norm, also speaks something about the performance of today software.


Web pages aren't laggy by design. It's a combination of using libraries which don't optimise it, not using tools to optimise it, and the lack of a well-engineered system.


They use it because it works. They use it for everything.

All of our interns: "How do I use this with VS code?"

They ssh using it, everything. They don't know anything else. They don't know that they are using ssh, they have a VS code terminal window open. It's kind of sad.

Meanwhile, I'm trying VS code on my 5 year old Mac, and it's like a slideshow. Open a file...wait for syntax highlighting to fill in, the keyboard response is laggy. It's so terrible. Sublime Text is a Tesla Plaid in comparison.


From what I heard VS Code is a really optimised and well designed piece of code.

Not sure what I’m going with this except I know VS code is touted as some kind of counter example a lot. But I also know such care didn’t go into most Electron codebases.


But is that Electron's fault?

Is there no chance that at least some of those code-bases would be inefficient in other frameworks too? And no chance that the projects would have taken longer to release, or wouldn't exist at all because it was deemed too much faf, if another framework were used?

Other than running separate node and chromium instances for every instance of an app, what does Electron do to encourage the inefficiency that is attributed to it?


> I do not know how many developers use VS code, but all of them are using electron and it seems to be fast enough for them.

At this point, I think the debate about slow apps is more ideological than reality.

I also think a lot of people are mistaking backend/network latency for front-end slowness. Slack isn’t going to load your scroll back history any faster if the backend is spending all of that time searching the database. People are too quick to blame the front end.

Either that, or some of these posters are running 10-year old hardware and wonder why it’s slow


It's what everyone else is using so it's fine. Isn't the convincing argument you might hope it is.


My argument is: engineering is not about best in absolute terms, but about best given constraints.

Some nuance and examples of traditional engineering (titanium as material) were given, if you want to rebuke them.

I am not appealing to popularity. I am offering a counterexample. It seems electron is not the problem (I showed a counter-example), but priorities are.

Software engineers did not wake up and decide to make programs slow. The economic incentives decided that it was not "worth it". For that to change, the incentives need to change. Just claiming it "should" be like this is not as insightful as it may seem.

In the same way we could complain about programs not being formally verified for correctness, instead of not being as fast as possible. Probably that is a worse offender.


> My argument is: engineering is not about best in absolute terms, but about best given constraints.

I usually agree. But laziness as the primary constraint should feel appalling to anyone who would willingly call themselves an engineer. Perhaps I actually mean to use a different word for engineer but I don't know what it should be

> I am not appealing to popularity. I am offering a counterexample. It seems electron is not the problem (I showed a counter-example), but priorities are.

I think you are, indirectly or inadvertently, citing electron and arguing in defense of it as a reasonable choice/option does do that.

> Software engineers did not wake up and decide to make programs slow. The economic incentives decided that it was not "worth it". For that to change, the incentives need to change. Just claiming it "should" be like this is not as insightful as it may seem.

Laziness is a choice. There are known better options, but software engineers *DID* choose to make it slow. The common axiom good, fast, or cheap; pick two. If you choose ease of development over fast you chose to make it slow. The people who build in python over C because they can write python faster, chose to make runtime slow. I guess you could argue for ignorance as a reason, but once you realize your software is slow, and refuse to do the work needed to fix it. You've also made the choice for it to be slow.

> In the same way we could complain about programs not being formally verified for correctness, instead of not being as fast as possible. Probably that is a worse offender.

Good engineers are willing to make the right trade offs. and I still believe that laziness is a positive trait in software engineers. But great engineers find a way to have both/everything, and people acting ethically don't prioritize their comfort or convenience over what's in the best interest of users.

The problem isn't if we can or can't make it fast, or correct, or user friendly. 55GiB/s fizzbuzz proves if you care enough, anything can be fast. The problem is people don't care, and that others who also don't care throw up their hands and say good enough is acceptable. And sure by definition it is acceptable. But I'd argue everything is awful, on fire, trash, user hostile, bad because too many people go ehh good enough you can stop trying.


I think we basically agree. Let me address what I think is the main difference.

It seems the difference is what we each are including in the "universe" of the problem. To me, it looks like you start and end with the developer (maybe developer + user), and assign laziness to them (aka you turn it into a moral problem).

On my argument, I am including the company, the users, and the whole economy. In that context, it is not about what the developer wants, but about the economical incentives, and it is not a moral problem but an optimization one.


It is relatively convincing when there are plentiful free alternatives and no one is telling you what to use (This from an emacs user).


These two viewpoints aren’t mutually exclusive. If they needed to develop fizzbuzz at 55GB/s, they might conclude that electron won’t get them there.


citation needed


More in the tradition of code golf, my reasonably-terse gawk fizzbuzz, abusing conditional expressions:

  #!/usr/bin/env gawk -f
  
  BEGIN {
      for (i=1;i<=100;i++) {
      printf(" %2s", i%(3*5)!=0 ? i%5!=0 ? i%3!=0 ? i : "fizz" : "buzz" : "fizzbuzz\n" )
      }
      printf("\n")
  }
No idea as to actual throughput though. I believe this is slower than explicit loop, if/then, or switch/case testing though (based on hazy recollections of past performance testing).


maybe it's fast, but is it enterprise quality? https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...


I love these challenges because I always learn a ton from just reading through and (trying) to understand the answers/entries. Same as with IOCCC [1].

[1] https://www.ioccc.org/



What a silly story. They applied for a job that made it clear they were looking for someone who could proficiently program, and they couldn't do it. So they didn't get the job. What a surprise.

Asking for a list of tasks that will come up or code examples is particularly naive. Technical jobs aren't a laundry list of exact needs.

Being an engineer in any field requires versatility, including the ability to solve problems on your feet and to learn new technology, or to learn tech you already know to a greater depth, at the drop of a hat.

This is one of the few times I've ever read a story about impostor syndrome where the story teller actually was an impostor. Bizarre.


The story is hilarious, because FizzBuzz actually worked as the hiring filter it was designed to be!

"Write FizzBuzz" - "OMG, MATH!"

Impostor meets Dunning-Kruger.


I think you missed the point of the story, which was that the company was asking for a combination of skills that never exist in the same person. An unicorn. In that context it is reasonable to assume that if you have the core competency (in this case design skills, not coding), then you should apply.

Also fizz buzz has nothing to do with the kind of programming that was expected from the job description. Now you might reasonably object that fizz buzz is supposed to be something so rudimentary that any programmer could implement. But the point is UI/UX people don't typically do any algorithms work at all. Their interaction with JS is often just to call an API and shove the resulting data where it needs to be in the DOM. They may never have to use a loop, ever. Or conditional testing. Or think about infinite sequences. To a proper software engineer like you or me fizz buzz seems ridiculously simple. But I could totally see a UX designer whose only interaction with self-taught JS is to glue APIs together being tripped up.


FizzBuzz is an "algorithm" in the same way that touching one's toes is gymnastics.


> I think you missed the point of the story, which was that *the company was asking for a combination of skills that never exist in the same person.*

I've asked the people I work with that have this combination of skills but, unfortunately, they all popped out of existence the moment the question left my mouth.

Perhaps they were impostors too?


So you're saying that you don't expect UI developers to ever implement a test?


Her point about the title "engineer" being in nearly every job posting while basically conferring zero meaning is definitely on point, but her mock outrage about being asked an extremely rudimentary programming question surprised me given that one of the requirements of the job was JavaScript...


You don’t need FizzBuzz to do that job. Ask her about KnockoutJS or Angular


In this context, I was expecting another unusual/proficient implementation of FizzBuzz. I didn't enjoy this story about someone who was asked FizzBuzz in an interview and didn't think the position should require her to know that.


Well it’s a silly test. Her point is that the interview should match job description. These FAANG coding tests are stupid


The job description required Javascript familiarity, FizzBuzz is a bare minimum for familiarity

It is barely in the same family tree as FAANG coding tests.


No, you need to go back and read it again. According to her story, it said:

“ HTML5, CSS3, JavaScript. I’m a master at the first two, but since there was no mention of programming stuff and the responsibilities section was so design-centric, I figured my jQuery proficiency and capacity to self-teach would suffice.”

The entire job description read like a UI/UX designer with a minimal understanding of JavaScript.

Her point is that if they wanted a software engineer then they should have hired one. She isn’t trying to be one.


> Experienced with Object Oriented JavaScript and modern JavaScript libraries such as Ember, Backbone, or Angular.

There's no version of a person who meets this requirement that cannot fizzbuzz. They put engineer in the title of the position.

"UX Engineer" is a completely reasonable title for someone who uses Javascript, HTML, and CSS to build web frontends. That person is absolutely a programmer, and absolutely must be able to fizzbuzz (and much, much more).


I understand that you are not familiar with the difference between basic qualification and preferred qualification.

In HR parlance, if the person meets the BQs then they are considered qualified. If someone comes along and you have two equals but the one of the two had the preferred then they are the preferred candidate. Preferred candidate would be expected to know fizzbuzz.

The part you pasted was in the preferred qualifications.

Also engineer term is overloaded here which is also her point


If you cannot fizzbuzz you are not basically qualified. Fizzbuzz is the basic qualification test. If you cannot do conditionals and remainder you cannot be expected to be reasonably proficient enough to do anything else of note in the language.

This is not a point of contention. It is the entire point of fizzbuzz. Fizzbuzz was designed to stop wasting interviewer time by filtering out people who could never in any way be construed to be qualified for a position that involved writing code.


You do not need to know FizzBuzz to hire as a UI/UX designer. End of story


All civil engineers think they have the right skills to be architects


She’s complaining that the job description didn’t mention coding as a requirement, but it actually says “deliver solid, reliable code”..?

And fizzbuzz is not a math problem. I’m sure the interviewer would’ve even given her a hint if she came as far as writing a loop and writing the different branches but not figuring out how to differentiate the cases. I’d shrug that off as nervousness perhaps.


Reading all the comments on there about people not knowing how to write fizz buzz was actually scary. I get that fizz buzz isn't a useful project or whatever, but it's not like you have to be a genius to figure it out. Someone mentioned needing CS161 or something... Not really a requirement, more like knowing how to write basic code.



"Thanks a lot, machine learning!" ahha


Yeah... I read all the way to 32 to find an error (and didn't check my working before posting this comment).

I wish I could meet these people who can't code a fizz buzz.


I love this story because I run across jobs all the time that sound great in the title and then you read the job description - it often sounds like a list of buzzwords or a combination of multiple people into one job. It’s like she says, they lumped everything into one position.


This doesn't use io_uring so it might be possible to make this go even faster these days...


I would expect this to slow it down rather than make it go faster.


// The main loop

// The bytecode interpreter consists of four instructions:

// 1. Load the bytecode from memory into %ymm2;

What is the performance impact of this load? Do you get more store-to-L2 memory bandwidth if you aren't loading at the same time?


I'm trying to go through this, obviously incredibly complicated, can anyone explain this comment?

>LINENO_TOP doesn't need to be initialized for new widths, because an overrun by 100 lines is possible, but by 10 billion lines isn't


I keep getting failures to resize kernel buffer, no matter how or where I pipe it.


Cool to know that I can write simple Java code and it's almost in the same order of magnitude as the insane sanic version.


How come no one has asked our GPT/AI/LLM “overlords” for a faster FizzBuzz implementation ;-)


LLM cannot produce something that doesnt exist already, it will probably spit out the same code ais523 wrote.


I sometimes wonder if someone is running that program right now until it crashes in a couple decades.


do I need glasses, or not a single c#/.net implementation?


found my glasses and there is a candidate AVX optimized to 1 GB/s


Could the FizzBuzz logic be computed "in-memory"?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: