This is time efficient* but rather wasteful of space.
The best way to save space is to use a Bloom Filter.
If we capture all the even numbers, that would sadly only give us "Definitely not Even" or "Maybe Even".
But for just the cost of doubling our space, we can use two Bloom filters!
So we can construct one bloom filter capturing even numbers, and another bloom filter capturing odd numbers.
Now we have "Definitely not Even" and "Maybe Even" but also "Definitely not Odd" and "Maybe Odd".
In this manner, we can use the "evens" filter to find the odd numbers and the "odds" filter to find the even numbers.
Having done this, we'll be left with just a handful of unlucky numbers that are recorded as both "Maybe even" and "Maybe odd". These will surely be few enough in number that we can special case these in our if/else block.
The filters as a first-pass will save gigabytes of memory!
> But for just the cost of doubling our space, we can use two Bloom filters!
We can optimize the hash function to make it more space efficient.
Instead of using remainders to locate filter positions, we can use a mersenne prime number mask (like say 31), but in this case I have a feeling the best hash function to use would be to mask with (2^1)-1.
You are absolutely write! Let me write a Go program to implement this idea. The bloom filters will take approximately 5gb (for a 1% error rate) and take a few minutes to populate on a modern Macbook Pro.
It's a constant number of lookups, and all good Computer Scientists know that it is therefore an O(1) algorithm.
It is hard to imagine better efficiency than O(1)!
Indeed we could improve it further by performing all evaluations even when we find the answer earlier, ensuring it is a true Constant Time algorithm, safe for use in cryptography.
> This is time efficient* but rather wasteful of space.
You're saying that the blog's solution is time efficient. Which it is not. Your solution may be O(1) but it is also not efficient. As I'm sure you are aware.
I can tell you a practical solution which is also O(1) and takes up maybe 2 or 3 instructions of program code and no extra memory at all.
`x & 1` or `x % 2 != 0`
This blog post was taking a joke and running with it. And your comment is in that spirit as well, I just wanted to point out that it's by no means time efficient when we have 2s or 1s complement numbers which make this algorithm trivial.
I may have missed the * meaning. I got that the bloom filter was an extension of the joke as I mentioned below. I was just clarifying in case someone else missed the joke.
You're absolutely right. The obvious solution would have been to create a boolean table containing all the pre-computed answers, and then simply use the integer you are testing as the index of the correct answer in memory. Now your isEven code is just a simple array lookup! Such an obvious improvement, I can't believe the OP didn't see it.
And with a little extra work you can shrink the whole table's size in memory by a factor of eight, but I'll leave that as an exercise for the interested reader.
If the "exercise" is to strictly rely on if-else statements, then the obvious speedup is to perform a binary search instead of a linear one. The result would still be horrifically space inefficient, but the speed would be roughly the time it takes to load 32x 4KB pages randomly from disk (the article memory-mapped the file). On a modern SSD a random read is 20 microseconds, so that's less than a millisecond for an even/odd check!
"That's good enough, ship it to production. We'll optimise it later."
Ah, yes, exactly the pointless diversion I needed for my lunch break. For science: generating a C# switch statement for similar purposes took 7 minutes on similar-ish hardware, but the resulting 99.2GB file could not be opened or compiled ('Stream is too long'), which was slightly disappointing.
Optimization efforts included increasing the internal buffer size of the StreamWriter used to create the source code: this reduced the runtime to around 6 minutes, as well as running a non-debug build, as it was observed that the poor Visual Studio metrics gathering process was contributing significantly to disk activity as well, but that ultimately didn't matter much. So, ehhm, yes, good job on that I guess?
Isn't the obvious thing to generate different classes for different ranges over the input? Then the classes could be loaded lazily.
And if you then make the ranges tree-shaped you get logarithmic complexity, which massively cuts down the O(n) of the rather naive chained `if` statements.
I wonder if you could generate it via a Roslyn incremental source generator instead of as a file to bypass this limit. I'm guessing not, but it does sound like fun.
I haven't found any authoritative source, but I strongly suspect that the .NET bytecode format has 32-bit limits all over the place. Maaaybe you could break up the code into functions less than 1 GB in size and then chain them together.
> any value over 2^31 seems to give random results.
Wow he really lucked out... On his way to perfecting a fully functioning and performant Even/Odd Detector, he stumbled upon a fully functioning and performant Coin Flip Simulator!
I took an ASIC design class in college, unfortunately with a heavy course load that didn't allow me to focus on it. For our final project we were given a numbered dictionary and asked to design a chip that would accept the characters on a 7 bit interface (ASCII), one character per clock cycle and output the dictionary number on an output interface but I can't remember how wide. We were graded on the size of the resulting ASIC and how many clock cycles it took from the last character in to the number on the output.
I started designing my modules, a ROM, a register with a ROM pointer, etc, etc, writing the Verilog and working out the clock sync between modules. Then I got 'lazy' and wrote a trie tree like implementation in Java, and have it spit out the whole tree in Verilog. It worked and just one clock cycle after the last letter my number would output. Fastest in the class! Also the most number of gates in the class. Surprised I got a 90 grade given I didn't use any of the advanced ASIC design the class taught. The TA didn't know what the hell they were looking at.
Yep! Something a bit counterintuitive on circuit design is that dedicated transistors will always beat reusing existing components. If we do reuse existing components like ALUs, multipliers, or state machines, we save on chip area but pay the penalty in clock cycles. Your approach was the extreme version of this tradeoff. You essentially unrolled the entire dictionary lookup into pure combinatorial logic (well, with registers for the input characters). One clock cycle latency because you weren't doing any sequential searching, comparing, or state machine transitions just racing electrons through logic gates.
It's akin to a compiler unrolling a loop. Uses more RAM (area) but fewer cycles to execute. Hardware synthesis uses many of the same techniques as compilers use to optimize code.
It's a common pitfall for those learning hardware description languages like Verilog, when they think about them like programming languages. If you go "if (calc) res <= a * b;" If res is 32 bits wide then you have instantiated a 32 bit fast multiplier circuit dedicated just to that one operation. This is often not what was intended.
Despite how leaning on the analogy too closely can mislead in that way, the analogy between hardware and software is not a shallow one. A combinatorial circuit is akin to the pure function of functional programming. Anything that can be described as a pure function working on fixed integers or floating point or other discrete data types, can be transformed into a combinatorial circuit. And there are algorithms to do so automatically and often with reasonable efficiency.
Free software synthesis has come a long way in recent years, by the way. There's even several hobbyist projects that can take VHDL or Verilog and produce layouts using TTL chips or even discrete transistor logic with automatic circuit board layout. You can now compile your code directly to circuit board copper masks and a part list.
Gemini took 4 seconds to answer this prompt: "Here is a number 4200020010101. Think deeply about it and tell me if it is not or or not even."
So if you're concerned with privacy issues, you can run the assembly version proposed in the article locally and be well within the same order of performance.
Let's thank the author of the article for providing a decent alternative to Google.
ah, but the license is not that good we can't reproduce his code.
This could be obviously done with much less code: Just add "if"s for all even number, and at the end just return "odd" if none of the evens matched. 50% less code!
Or even simpler: If it's 0, return "even". If not, do a recursive call to n-1, if that equals "even", return "odd", otherwise return "even".
But the best way is probably to just use a library. Yes, 500MB of additional dependencies, but then it's a one-liner.
You brought up an important opportunity for optimization. If you know the distribution of your data, it may make more sense to implement it in terms of the odd numbers and leave even numbers as the fallback. It's important to profile with a realistic distribution of data to make sure you're targeting the correct parity of numbers.
Good point. Have two programs - one checking every even number and returning odd of not even. And then have a program checking every odd number and returning even if not. Then, a simple program to dispatch to either program randomly, so you end up in the long term with good performance for each.
Your mention of Microservices opened up my mind to additional possibilities. How about we create a microservice for each integer, then deploy 4 billion of them. Send a request to all of them simultaneously. Only one of them will respond with the answer. We still need to decide how to deploy those microservices - one per machine, or multiple per machine?
> Now, this is a time-memory tradeoff, but my time on this earth is limited so I decided to meta-program the if statements using a programmer program in a different programming language.
Yeah... I come here to talk about that. Should have been
for i in range(0, 2**8, 2):
print(" if (number == "+str(i)+")")
print(" printf(\"even\\n\");")
print(" if (number == "+str(i + 1)+")")
print(" printf(\"odd\\n\");")
or
for i in range(0, 2**8, 2):
print(f""" if (number == {i})
puts("even");
if (number == {i + 1})
puts("odd");""")
I think we can improve this. Just make a microservice who generates the code on the fly and streams it to the compiler. Then you also just have to create the necessary code and don't waste the SSD with unused code-paths.
Microservice kinda implies usage of a container for me. How else would we google-scale it to serve all requests in parallel?
But thinking about, we probably have to use some more microservices, we can't put all that burden on the requester. So a dedicated service for compiling and executing in sandboxes would be necessary. Also, some local load balancers to control the flow and filter out the useless answers. So I'm not an expert on that devops-magic, but I guess this means ~12.5 billion pods fast enough result. Do Amazon or Google offer planetary scale for services?
I recently asked a Qwen model to write me some code to remove errant spaces ("c he es e" instead of "cheese") in OCR'd text. It proceeded to write 'if' blocks for every combo of all English words and all possible errant spaces. I did not wait for it to finish...
If the author is available for consulting I have this bag of rice I need cooked. Should be around 30,000 grains, each needs about 1mL of water and 2m on the stove. Will pay $10 (2025 dollars)
I see this is 2023… the article refs GPT even then. Can’t believe it’s already that much time gone by, still seems like “last years big news”
I was gonna comment “this is what I really like to see on HN”. Then I saw the date and was sad that we’re having to dip into the history box to find fun/interesting articles more often of late it seems.
Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
> Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
I would be interested in a permanent moratorium, personally. There's no interesting content to be had in the various LLM articles that litter HN these days. Or failing a topic ban, at least give a way to filter it for those of us who are sick of hearing about AI hype.
I know it's silly, but I just want to fix his first version with the minimum possible changes;
/* Copyright 2023. All unauthorized distribution of this source code
will be persecuted to the fullest extent of the law*/
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(int argc, char* argv[])
{
uint8_t number = argc>1 ? argv[1][strlen(argv[1])-1]-'0' : printf("Usage: odd-or-even number\n");
if (number == 0)
printf("even\n");
if (number == 1)
printf("odd\n");
if (number == 2)
printf("even\n");
if (number == 3)
printf("odd\n");
if (number == 4)
printf("even\n");
if (number == 5)
printf("odd\n");
if (number == 6)
printf("even\n");
if (number == 7)
printf("odd\n");
if (number == 8)
printf("even\n");
if (number == 9)
printf("odd\n");
if (number == 10)
printf("even\n");
}
This way it basically works. It's a shame that it doesn't call out a non numeric argument but that's about the only problem. It relies on a trick, printf() returns the number of characters printed, so the error message string needs to be longer than 10.
It certainly would be normal to use else if (or switch) if you wanted to be picky but really such changes are inconsequential here. And I was trying to change just one line. Sadly I also had to quietly change stdlib.h to string.h as well.
If you wanted to avoid <string.h>, you could use the poor man's strlen(), snprintf(0,0,"%s",argv[1]). For full input validation without adding any more statements, the best I can get (in ISO C) is
uint8_t number = (argc<2||sscanf(*++argv,"%*[0123456789]%n",&argc)||argc--[*argv]?printf("bad\n"):argc[*argv])-'0'; // No problems here
Though with either function you may run into issues if the OS allows arguments longer than INT_MAX. To be defensive, you could use "%32767s" or "%*32767[0123456789]%n" instead, at the cost of failing inputs longer than 32KiB.
With data-driven programming, I would have expected an SQL table containing all the precomputed results. Unless you carelessly add an index, it has the same asymptotic performance!
Thanks for making me doubt myself & googling who that guy who made python was again, because surely "van der Gussom" isn't a normal Dutch name. Well played.
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
I just had flash backs to a previous job where I was brought in to optimize another teams builds since they were now taking minutes instead of seconds.
I tracked it down to a folder with thousands of C++ files called things like uint_to_int.cc and inch_to_cm.cc and cm_to_m.cc. Basically the developer in charge of writing the conversion library took our typed units library and autogenerated a C++ file for every possible conversion the application might need to make.
Every time we added a new typed unit it would create another couple of dozen files to be compiled.
This reminds me of when I learned to program on my casio calculator.
There was a function to detect a key press which would return a number identifying the pressed key.
I needed to map that number to the letter printed on the key to print it on the screen.
I don't remember whether there was no hashmap data structure or I just didn't know about it, but I implemented it with a serie of if.
The problem with that solution is that while mapping A was fast, Z was very slow because it was at the end of the list.
That is how I discovered divide and conquer/ dichotomy with if branches.
I would have wanted to see them look at the assembly from various optimization levels to see if the compiler really did. Ideally o1 or something wouldn't see through this but would generate better code in other ways. Disabled optimizations often are really stupid about how they code gen.
Well I created the 16 bit .c file, because I'm not that curious. gcc -O0 completed immediately and made a 1,5MB executable. -O1 took about 10 minutes for a 1,8 MB executable. -O2 has been running for 1h15m so far... i7-14700K
I'm in too deep now, so I'll let it run while I'm at work.
GCC -O2 made a 1,8 MB executable after a bit over four hours. I'm not trying -O3 :D
I don't know enough about compilers to answer why this doesn't get optimised down to something tiny, or why it took so long. I'm not sure what we've learned tonight, but there you go.
> I decided to implement this in the C programming language as it’s by far the fastest language on the planet to this day (thanks to the visionary genius Dennis Richie)
Am I lost?
Aren't the compiler/linker responsible for fast code, not the language itself?
There are language issues as well. 99% of C programs are valid C++, and if you compile with a C++ compiler instead of a C++ compiler will be slightly faster! C++ ha a stronger type system and so once in a while (very rarely) those C programs compile but give incorrect results since C++ allowed the optimizer to make an assumption that wasn't true. Fortran is often even faster because the language allows for even more assumptions. I don't know where Rust fits in here (Rust is hurt today because the better optimizes are designed for C++ and so don't take advantage of extra assumptions Rust allows - it was designed to allow different assumptions from C++ and likely could be better would a ground up optimizer but that would take a large team a decade+ to write: expensive)
Most of the difference in speed is the optimizer/linker. Assuming a fair competition the difference between Ada, C, C++, D, Fortran, Rust, Zig (and many others I can't think of) is very rarely even 1% in any real world situation. Of course it isn't hard add pessimization to make any language lose, sometimes accidentally, so fair competitions are very hard to find/make.
> I decided to use the slowest language on the planet, Python (thanks to the visionary genius of Ross van der Gussom).
given the article, it's fair to assume the author was joking around
that being said, the way the language is used and its ecosystem do contribute to the executable's efficiency. yet, given C's frugality, or the proximity between its instructions and the executed ones, it's not unfair to say that "C is fast"
Both, usually. A language's semantics can limit how much a compiler can speed up the language. Python, for example, is extremely difficult to make fast due to the fact that almost everything has the semantics of a hashmap lookup. C, in comparison, has relatively little in it that can't be mapped fairly straightforwardly to assembly, and then most of it can be mapped in a more difficult way to faster assembly.
> I saw from the SSD was around 800 MB/s (which doesn’t really make sense as that should give execution speeds at 40+ seconds, but computers are magical so who knows what is going on).
If anyone knows what’s actually going on, please do tell.
No, it needs to read the entire executable in order to be correct, it can't skip anything. Therefore the time for the IO must be a lower bound, predictive branching can't help that.
Infact, some really performant code such as glMatrix.js do not use any for loops for matrix math, just to allow the javascript engine to optimize the code as much as possible.
> How did I do this? Well I jumped online, using a mix of my early life experience coding emulators and hacking and looked into the x86(-64) architecture manuals to figure out the correct opcodes and format for each instruction. … Just kidding, that’s horrible. I asked ChatGPT
Ok but if you do want to play with writing binary code manually I recommend Casey Muratori's performance course
A much cooler approach would have been to generate the ASM from the same program, rather than generate a file from python and load that file from C++. The multi-stage build and filesystem are completely unnecessary.
The technique actually has a lot of practical applications, so it's useful to have a C++ library that helps you with generating amd64 machine code.
The interesting part isn’t the if-statement count but how quickly the compiler and branch predictor erase the differences. It’s a nice demo of why “manual cleverness” rarely beats modern toolchains.
printf("%d is %s\n", n, last_binary_bit(n) == 0 ? "even" : "odd");
and the rest is trivial:
int last_binary_bit(int n) {
if (n == 0) return 0;
if (n == 1) return 1;
if (n == 2) return 0;
...
}
Come to think of it, with a little fancy math you could divide and conquer:
int last_binary_bit(int n) {
// Handle the easy cases.
if (n == 0) return 0;
if (n == 1) return 1;
// Number may be large. Divide and conquer. It doesn't matter where we split it,
// so use a randomized algorithm because those are fast.
for (;;) {
int r = random();
if (r < n) {
// Smaller numbers are easier.
int smaller1 = r;
int smaller2 = n - 4;
int bit1 = last_binary_bit(smaller1);
int bit2 = last_binary_bit(smaller2);
// Fancy math: even + even is even, even + odd is odd, etc.
if (bit1 == 0 && bit2 == 0) return 0;
if (bit1 == 0 && bit2 == 1) return 1;
if (bit1 == 1 && bit2 == 0) return 1;
if (bit1 == 1 && bit2 == 1) return 0;
}
}
}
This reminds me of my personal "prime number" grabber research https://github.com/tigranbs/prime-numbers I needed to create the unique graph nodes and assign prime numbers, and to make things efficient, I thought, why not just download the list of known prime numbers instead of generating them one by one. So I did and compiled everything with a single Go binary. Ehh, good old days with a nice feith in making "the best" crappy software out there.
Map reduce, cluster geo-failover and CDN caching for optimized coldstarts in case you have to bring it up from scratch in a new datacenter. Bio for contact info, hourly billing. I have helped many startups reach their first 100MARR. Buy my audiobook.
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
I would also like to praise the visionary genius Ross van der Gussom, without whom this wonderful achievement in software engineering would not have been possible!
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <string>\n", argv[0]);
return 1;
}
char *s = argv[1];
int i;
/* find the end of the string */
for (i = 0; s[i] != '\0'; ++i)
;
/* make sure the string wasn't empty */
if (i == 0) {
fprintf(stderr, "Error: empty string\n");
return 1;
}
/* last character is at s[i - 1] */
char d = s[i - 1];
if (d == '0')
printf("even\n");
if (d == '1')
printf("odd\n");
if (d == '2')
printf("even\n");
if (d == '3')
printf("odd\n");
if (d == '4')
printf("even\n");
if (d == '5')
printf("odd\n");
if (d == '6')
printf("even\n");
if (d == '7')
printf("odd\n");
if (d == '8')
printf("even\n");
if (d == '9')
printf("odd\n");
return 0;
}
number="$1"
if [[ "$number" =~ "^(2|4|6|8|10|12|14|16|18|20)$" ]]; then
echo even
elif [[ "$number" =~ "^(1|3|5|7|9|11|13|15|17|19)$" ]]; then
echo odd
else
echo Nan
fi
This is time efficient* but rather wasteful of space.
The best way to save space is to use a Bloom Filter.
If we capture all the even numbers, that would sadly only give us "Definitely not Even" or "Maybe Even".
But for just the cost of doubling our space, we can use two Bloom filters!
So we can construct one bloom filter capturing even numbers, and another bloom filter capturing odd numbers.
Now we have "Definitely not Even" and "Maybe Even" but also "Definitely not Odd" and "Maybe Odd".
In this manner, we can use the "evens" filter to find the odd numbers and the "odds" filter to find the even numbers.
Having done this, we'll be left with just a handful of unlucky numbers that are recorded as both "Maybe even" and "Maybe odd". These will surely be few enough in number that we can special case these in our if/else block.
The filters as a first-pass will save gigabytes of memory!
> But for just the cost of doubling our space, we can use two Bloom filters!
We can optimize the hash function to make it more space efficient.
Instead of using remainders to locate filter positions, we can use a mersenne prime number mask (like say 31), but in this case I have a feeling the best hash function to use would be to mask with (2^1)-1.
This produced strange results on my ternary computer. I had to use a recursive popcnt instead.
this is my new favorite comment on this cursed website
You are absolutely write! Let me write a Go program to implement this idea. The bloom filters will take approximately 5gb (for a 1% error rate) and take a few minutes to populate on a modern Macbook Pro.
https://gist.github.com/alexjurkiewicz/1abf05f16fd98aabf380c...
How is this time efficient at all? It takes upwards of 40 seconds to compute on large 32bit values.
It's a joke post with some interesting bits and details.
It's a constant number of lookups, and all good Computer Scientists know that it is therefore an O(1) algorithm.
It is hard to imagine better efficiency than O(1)!
Indeed we could improve it further by performing all evaluations even when we find the answer earlier, ensuring it is a true Constant Time algorithm, safe for use in cryptography.
> This is time efficient* but rather wasteful of space.
You're saying that the blog's solution is time efficient. Which it is not. Your solution may be O(1) but it is also not efficient. As I'm sure you are aware.
I can tell you a practical solution which is also O(1) and takes up maybe 2 or 3 instructions of program code and no extra memory at all.
`x & 1` or `x % 2 != 0`
This blog post was taking a joke and running with it. And your comment is in that spirit as well, I just wanted to point out that it's by no means time efficient when we have 2s or 1s complement numbers which make this algorithm trivial.
You need to read their entire comment as a joke.
I guess I should have been more clear that I was just pointing out the obvious in case some confused reader missed the joke.
lol
Which was also obvious, but maybe also needed pointing out, which says something about online discussion. Something obvious, probably.
explaining the joke spoils the joke, such is social convention.
Forgive me for not being funny.
It's alright. I don't make the rules.
> I just wanted to point out that
We already know. Everybody knows. That's the joke. There's no need to point out anything.
How are you able to recognize a joke post but not a joke comment?
I may have missed the * meaning. I got that the bloom filter was an extension of the joke as I mentioned below. I was just clarifying in case someone else missed the joke.
You're absolutely right. The obvious solution would have been to create a boolean table containing all the pre-computed answers, and then simply use the integer you are testing as the index of the correct answer in memory. Now your isEven code is just a simple array lookup! Such an obvious improvement, I can't believe the OP didn't see it.
And with a little extra work you can shrink the whole table's size in memory by a factor of eight, but I'll leave that as an exercise for the interested reader.
Maybe we can even find some correlation in the bit pattern of the input and the Boolean table!
If the "exercise" is to strictly rely on if-else statements, then the obvious speedup is to perform a binary search instead of a linear one. The result would still be horrifically space inefficient, but the speed would be roughly the time it takes to load 32x 4KB pages randomly from disk (the article memory-mapped the file). On a modern SSD a random read is 20 microseconds, so that's less than a millisecond for an even/odd check!
"That's good enough, ship it to production. We'll optimise it later."
The comment you're replying to is also a joke, with some interesting bits and details.
I think I'll just avoid commenting on jokes from now on.
r/whoosh
Ah, yes, exactly the pointless diversion I needed for my lunch break. For science: generating a C# switch statement for similar purposes took 7 minutes on similar-ish hardware, but the resulting 99.2GB file could not be opened or compiled ('Stream is too long'), which was slightly disappointing.
Optimization efforts included increasing the internal buffer size of the StreamWriter used to create the source code: this reduced the runtime to around 6 minutes, as well as running a non-debug build, as it was observed that the poor Visual Studio metrics gathering process was contributing significantly to disk activity as well, but that ultimately didn't matter much. So, ehhm, yes, good job on that I guess?
Isn't the obvious thing to generate different classes for different ranges over the input? Then the classes could be loaded lazily.
And if you then make the ranges tree-shaped you get logarithmic complexity, which massively cuts down the O(n) of the rather naive chained `if` statements.
I wonder if you could generate it via a Roslyn incremental source generator instead of as a file to bypass this limit. I'm guessing not, but it does sound like fun.
You can totally use source generators for that.
You're only allowed up to 65535 locals, but this includes hidden locals, which the compiler adds if you're compiling in debug mode.
So you have to make sure to compile only in release mode just to get to 16 bits.
To match the article, you'd want to directly emit the Intermediate Language (IL) tokens with something like this: https://learn.microsoft.com/en-us/dotnet/api/system.reflecti...
I haven't found any authoritative source, but I strongly suspect that the .NET bytecode format has 32-bit limits all over the place. Maaaybe you could break up the code into functions less than 1 GB in size and then chain them together.
[dead]
Similar humour if opposite directions to an old favourite: https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/
I expected some job interview meme[1][2] but I did not know this one and it looks like a real story too! Thanks for sharing, that was a fun read.
[1]: https://aphyr.com/posts/342-typing-the-technical-interview
[2]: https://www.richard-towers.com/2023/03/11/typescripting-the-...
I love the Aphyr posts.
> “Can I use any language?” > > “Sure.” > > Move quickly, before he realizes his mistake.
Works both in job interviews and real projects!
I’m almost serious, the only time I saw haskell in production was after a similar scenario.
> any value over 2^31 seems to give random results.
Wow he really lucked out... On his way to perfecting a fully functioning and performant Even/Odd Detector, he stumbled upon a fully functioning and performant Coin Flip Simulator!
I took an ASIC design class in college, unfortunately with a heavy course load that didn't allow me to focus on it. For our final project we were given a numbered dictionary and asked to design a chip that would accept the characters on a 7 bit interface (ASCII), one character per clock cycle and output the dictionary number on an output interface but I can't remember how wide. We were graded on the size of the resulting ASIC and how many clock cycles it took from the last character in to the number on the output.
I started designing my modules, a ROM, a register with a ROM pointer, etc, etc, writing the Verilog and working out the clock sync between modules. Then I got 'lazy' and wrote a trie tree like implementation in Java, and have it spit out the whole tree in Verilog. It worked and just one clock cycle after the last letter my number would output. Fastest in the class! Also the most number of gates in the class. Surprised I got a 90 grade given I didn't use any of the advanced ASIC design the class taught. The TA didn't know what the hell they were looking at.
Yep! Something a bit counterintuitive on circuit design is that dedicated transistors will always beat reusing existing components. If we do reuse existing components like ALUs, multipliers, or state machines, we save on chip area but pay the penalty in clock cycles. Your approach was the extreme version of this tradeoff. You essentially unrolled the entire dictionary lookup into pure combinatorial logic (well, with registers for the input characters). One clock cycle latency because you weren't doing any sequential searching, comparing, or state machine transitions just racing electrons through logic gates.
It's akin to a compiler unrolling a loop. Uses more RAM (area) but fewer cycles to execute. Hardware synthesis uses many of the same techniques as compilers use to optimize code.
It's a common pitfall for those learning hardware description languages like Verilog, when they think about them like programming languages. If you go "if (calc) res <= a * b;" If res is 32 bits wide then you have instantiated a 32 bit fast multiplier circuit dedicated just to that one operation. This is often not what was intended.
Despite how leaning on the analogy too closely can mislead in that way, the analogy between hardware and software is not a shallow one. A combinatorial circuit is akin to the pure function of functional programming. Anything that can be described as a pure function working on fixed integers or floating point or other discrete data types, can be transformed into a combinatorial circuit. And there are algorithms to do so automatically and often with reasonable efficiency.
Free software synthesis has come a long way in recent years, by the way. There's even several hobbyist projects that can take VHDL or Verilog and produce layouts using TTL chips or even discrete transistor logic with automatic circuit board layout. You can now compile your code directly to circuit board copper masks and a part list.
Gemini took 4 seconds to answer this prompt: "Here is a number 4200020010101. Think deeply about it and tell me if it is not or or not even."
So if you're concerned with privacy issues, you can run the assembly version proposed in the article locally and be well within the same order of performance.
Let's thank the author of the article for providing a decent alternative to Google.
ah, but the license is not that good we can't reproduce his code.
>Think deeply about it and tell me if it is not or or not even
I think I just experienced a segfault
Why do they call it even when you of in the true number of out false odd the number?
Has anyone really been far even as decided to use even go want to do Look more like?
Hey, why segfault and not stack overflow?
+1
Nice, that swaps odd and even around.
> if it is not or or not even
Did you want to test the LLM's grammatical comprehension?
When I'm tired my typing goes bad. I obvioulsy meant: "is that number even odd ?" :-)
Finally a problem that Microsoft Phi can ace. Probably. Maybe. Some of the time at least.
Surely you mean Microsoft CoPhiLot 365
This could be obviously done with much less code: Just add "if"s for all even number, and at the end just return "odd" if none of the evens matched. 50% less code!
Or even simpler: If it's 0, return "even". If not, do a recursive call to n-1, if that equals "even", return "odd", otherwise return "even".
But the best way is probably to just use a library. Yes, 500MB of additional dependencies, but then it's a one-liner.
But then, even numbers will have the worst possible performance.
You brought up an important opportunity for optimization. If you know the distribution of your data, it may make more sense to implement it in terms of the odd numbers and leave even numbers as the fallback. It's important to profile with a realistic distribution of data to make sure you're targeting the correct parity of numbers.
Good point. Have two programs - one checking every even number and returning odd of not even. And then have a program checking every odd number and returning even if not. Then, a simple program to dispatch to either program randomly, so you end up in the long term with good performance for each.
That sounds kinda stupid, it would completely destroy the original space savings. Unless the sharding makes it fit within compiler limits.
Why not run both and use the result retrieved the fastest.
Yeeessss! Microservices!
Your mention of Microservices opened up my mind to additional possibilities. How about we create a microservice for each integer, then deploy 4 billion of them. Send a request to all of them simultaneously. Only one of them will respond with the answer. We still need to decide how to deploy those microservices - one per machine, or multiple per machine?
Could try 1 per IPv4 address? There’s just barely enough!
You could save stack space by transforming it into a loop. It’s still only O(n)!
> Now, this is a time-memory tradeoff, but my time on this earth is limited so I decided to meta-program the if statements using a programmer program in a different programming language.
No comment...Yeah... I come here to talk about that. Should have been
orWhat happens when you try to compute 2**8+1 ?
If its too large you could just subtract 2*8 and try again.
Should work fine with long long?
I think we can improve this. Just make a microservice who generates the code on the fly and streams it to the compiler. Then you also just have to create the necessary code and don't waste the SSD with unused code-paths.
I’m disappointed there is no docker image for this. How will I test it out?
Microservice kinda implies usage of a container for me. How else would we google-scale it to serve all requests in parallel?
But thinking about, we probably have to use some more microservices, we can't put all that burden on the requester. So a dedicated service for compiling and executing in sandboxes would be necessary. Also, some local load balancers to control the flow and filter out the useless answers. So I'm not an expert on that devops-magic, but I guess this means ~12.5 billion pods fast enough result. Do Amazon or Google offer planetary scale for services?
How horridly inefficient, he should have just flipped a Boolean on each iteration.
Horridly inefficient. Just unfold the loop.
Just output odd and even for each pass and increment by two. Need to make sure you have the right starting value, and check for off-by-one errors.
It should have used a flag that is being toggled.
Claude's version:
As usual, a non-marginally superior mind to commentators.The author missed an opportunity for a much shorter solution for the given problem statement.
Brilliant! Mr Boole would love this
I recently asked a Qwen model to write me some code to remove errant spaces ("c he es e" instead of "cheese") in OCR'd text. It proceeded to write 'if' blocks for every combo of all English words and all possible errant spaces. I did not wait for it to finish...
If the author is available for consulting I have this bag of rice I need cooked. Should be around 30,000 grains, each needs about 1mL of water and 2m on the stove. Will pay $10 (2025 dollars)
Aha, you forgot to specify the country of those "dollars"! For $10 (2025 [Cayman Islands] dollars)! Which is higher than USD10
Zimbabwe: 0.027 USD
I see this is 2023… the article refs GPT even then. Can’t believe it’s already that much time gone by, still seems like “last years big news”
I was gonna comment “this is what I really like to see on HN”. Then I saw the date and was sad that we’re having to dip into the history box to find fun/interesting articles more often of late it seems.
Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
> Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
I would be interested in a permanent moratorium, personally. There's no interesting content to be had in the various LLM articles that litter HN these days. Or failing a topic ban, at least give a way to filter it for those of us who are sick of hearing about AI hype.
Definitely not a visionary. This is how you do it in 2025: https://imgur.com/rWiP90P
> Then I saw the date and was sad that we’re having to dip into the history box to find fun/interesting articles more often of late it seems
We don’t _have_ to. You could start a blog and display the indomitable human spirit.
I know it's silly, but I just want to fix his first version with the minimum possible changes;
This way it basically works. It's a shame that it doesn't call out a non numeric argument but that's about the only problem. It relies on a trick, printf() returns the number of characters printed, so the error message string needs to be longer than 10.Wouldn't using elif for all comparisons after the first improve performance?
Or is the performance considered worse because it becomes O(n) (where n < MAX_UINT) vs. constant time ( O(MAX_UINT) )
It certainly would be normal to use else if (or switch) if you wanted to be picky but really such changes are inconsequential here. And I was trying to change just one line. Sadly I also had to quietly change stdlib.h to string.h as well.
If you wanted to avoid <string.h>, you could use the poor man's strlen(), snprintf(0,0,"%s",argv[1]). For full input validation without adding any more statements, the best I can get (in ISO C) is
Though with either function you may run into issues if the OS allows arguments longer than INT_MAX. To be defensive, you could use "%32767s" or "%*32767[0123456789]%n" instead, at the cost of failing inputs longer than 32KiB.I prefer data-driven programming, so a simple:
works for me, alongside a pretty big array.With data-driven programming, I would have expected an SQL table containing all the precomputed results. Unless you carelessly add an index, it has the same asymptotic performance!
You might like the IBM 1620.
https://en.wikipedia.org/wiki/IBM_1620#Anecdotes
This is the way.
I have never seen anyone argue for a ‘switch’ version.
Slightly less code to generate.you forgot the logic to strip the final digit and assign it to v.
processing the whole number is absurd
I think the idea is to fill in the ellipses with even/odd numbers, up to 4B.
You know, to save the performance cost of processing the input as a string, and chomping off all but the last character.
Converting to decimal is just as absurd.
All you need is the final binary digit, which incidentally is the most optimal codegen, `v & 1`.
Look at Mr. Rocket Scientist over here...
Any good engineer knows there is no "best" solution, only tradeoffs.
Save space.
Ditto. Sure, this overflows the stack, but you look cool doing it. Save time. Just buy more RAM.You can combine the second and third strategies to hit the sweet spot of time and space.
God help us if that code ever makes it's way onto npm.
isEven is a performant, hand-compiled evenness checker for any 32 bit integer. A single file import that does one job and one job only!
it follows the UNIX philosophy of doing one thing and donig it well.
Donig!! LOL maybe a mistake maybe not but still funny
discussed 2 years ago,
https://news.ycombinator.com/item?id=38790597
4B If Statements (469 comments)
Meta: Yeah, this should have a "(2023)" tag in the title. Thanks.
> Visionary genius Ross van der Gussom
Thanks for making me doubt myself & googling who that guy who made python was again, because surely "van der Gussom" isn't a normal Dutch name. Well played.
Previously on HN
https://news.ycombinator.com/item?id=38790754
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
Amazing!
This line from the article -- I will be laughing about it for days.
Really if you are not making custom silicon for this problem, you are just wasting our time here aren't you.
I mean, this is the ultimate domain for Quantum Computers:
"Is it odd or Even?"
"YES"
I just had flash backs to a previous job where I was brought in to optimize another teams builds since they were now taking minutes instead of seconds.
I tracked it down to a folder with thousands of C++ files called things like uint_to_int.cc and inch_to_cm.cc and cm_to_m.cc. Basically the developer in charge of writing the conversion library took our typed units library and autogenerated a C++ file for every possible conversion the application might need to make.
Every time we added a new typed unit it would create another couple of dozen files to be compiled.
This reminds me of when I learned to program on my casio calculator.
There was a function to detect a key press which would return a number identifying the pressed key.
I needed to map that number to the letter printed on the key to print it on the screen. I don't remember whether there was no hashmap data structure or I just didn't know about it, but I implemented it with a serie of if.
The problem with that solution is that while mapping A was fast, Z was very slow because it was at the end of the list. That is how I discovered divide and conquer/ dichotomy with if branches.
i tried in ruby up to 1 million (1 billion was taking too long)
and added at the top but i'm getting SIGILLOoh a new bug.
kind of expected gcc to see right through the 300 gigs of code and compile it down to the tenish instructions.
They disabled optimisations:
> Lets compile the code, disabling optimizations with /Od to make sure that the pesky compiler doesn’t interfere with our algorithm
I would have wanted to see them look at the assembly from various optimization levels to see if the compiler really did. Ideally o1 or something wouldn't see through this but would generate better code in other ways. Disabled optimizations often are really stupid about how they code gen.
> disabling optimizations with /Od
And that weird flag is because it's a windows compiler: cl.exe, not gcc.
I'm curious what GCC would do if it wasn't purposely lobotomised and fed 300 GB of this nuclear waste.
Well I created the 16 bit .c file, because I'm not that curious. gcc -O0 completed immediately and made a 1,5MB executable. -O1 took about 10 minutes for a 1,8 MB executable. -O2 has been running for 1h15m so far... i7-14700K
I'm in too deep now, so I'll let it run while I'm at work.
Keep us updated.
GCC -O2 made a 1,8 MB executable after a bit over four hours. I'm not trying -O3 :D
I don't know enough about compilers to answer why this doesn't get optimised down to something tiny, or why it took so long. I'm not sure what we've learned tonight, but there you go.
Can't you just implement javascript interpreter and import is-even package like normal developers?
> I decided to implement this in the C programming language as it’s by far the fastest language on the planet to this day (thanks to the visionary genius Dennis Richie)
Am I lost? Aren't the compiler/linker responsible for fast code, not the language itself?
There are language issues as well. 99% of C programs are valid C++, and if you compile with a C++ compiler instead of a C++ compiler will be slightly faster! C++ ha a stronger type system and so once in a while (very rarely) those C programs compile but give incorrect results since C++ allowed the optimizer to make an assumption that wasn't true. Fortran is often even faster because the language allows for even more assumptions. I don't know where Rust fits in here (Rust is hurt today because the better optimizes are designed for C++ and so don't take advantage of extra assumptions Rust allows - it was designed to allow different assumptions from C++ and likely could be better would a ground up optimizer but that would take a large team a decade+ to write: expensive)
Most of the difference in speed is the optimizer/linker. Assuming a fair competition the difference between Ada, C, C++, D, Fortran, Rust, Zig (and many others I can't think of) is very rarely even 1% in any real world situation. Of course it isn't hard add pessimization to make any language lose, sometimes accidentally, so fair competitions are very hard to find/make.
Then again, the article was clearly sarcastic.
> I decided to use the slowest language on the planet, Python (thanks to the visionary genius of Ross van der Gussom).
given the article, it's fair to assume the author was joking around
that being said, the way the language is used and its ecosystem do contribute to the executable's efficiency. yet, given C's frugality, or the proximity between its instructions and the executed ones, it's not unfair to say that "C is fast"
Both, usually. A language's semantics can limit how much a compiler can speed up the language. Python, for example, is extremely difficult to make fast due to the fact that almost everything has the semantics of a hashmap lookup. C, in comparison, has relatively little in it that can't be mapped fairly straightforwardly to assembly, and then most of it can be mapped in a more difficult way to faster assembly.
It's a wild statement for a few reasons; your observation is one of them.
People often use language as synonym for the whole ecosystem including compiler and linker
> I saw from the SSD was around 800 MB/s (which doesn’t really make sense as that should give execution speeds at 40+ seconds, but computers are magical so who knows what is going on).
If anyone knows what’s actually going on, please do tell.
Presumably after the first run much or all of the program is paged into OS memory
Yes, or it was still in memory from writing.
The numbers match quite nicely. 40gb program size minus 32gb RAM is 8gb, divided by 800mb/s makes 10 seconds.
I'm not entirely sure but could it be predictive branching?
No, it needs to read the entire executable in order to be correct, it can't skip anything. Therefore the time for the IO must be a lower bound, predictive branching can't help that.
Infact, some really performant code such as glMatrix.js do not use any for loops for matrix math, just to allow the javascript engine to optimize the code as much as possible.
https://github.com/toji/gl-matrix/blob/master/dist/gl-matrix...
> How did I do this? Well I jumped online, using a mix of my early life experience coding emulators and hacking and looked into the x86(-64) architecture manuals to figure out the correct opcodes and format for each instruction. … Just kidding, that’s horrible. I asked ChatGPT
Ok but if you do want to play with writing binary code manually I recommend Casey Muratori's performance course
A much cooler approach would have been to generate the ASM from the same program, rather than generate a file from python and load that file from C++. The multi-stage build and filesystem are completely unnecessary.
The technique actually has a lot of practical applications, so it's useful to have a C++ library that helps you with generating amd64 machine code.
I love "stupid" stuff like this; you normally learn something small and seemingly inane. It's fun!
The interesting part isn’t the if-statement count but how quickly the compiler and branch predictor erase the differences. It’s a nice demo of why “manual cleverness” rarely beats modern toolchains.
I kept waiting for the payoff to be "The optimizer reduced the entire series of if statements to a single instruction"
Would be more maintainable if they injected the loading strategy to be used as a dependency from config instead of hardcoding it :/
If you were to convert an llm model into code, it would have like 500 billion if statements like that
This is also a nice approach for FizzBuzz in leetcode interviews.
Moreover, interviewers will be smitten by the hand-optimized assembly code.
I don't get it. why not just look look at the last binary bit?
Hm, good thought. You could just do
and the rest is trivial: Come to think of it, with a little fancy math you could divide and conquer:Oh, I have an idea for better leftpad implementation, let me publish that to npm real quick!
This reminds me of my personal "prime number" grabber research https://github.com/tigranbs/prime-numbers I needed to create the unique graph nodes and assign prime numbers, and to make things efficient, I thought, why not just download the list of known prime numbers instead of generating them one by one. So I did and compiled everything with a single Go binary. Ehh, good old days with a nice feith in making "the best" crappy software out there.
Meanwhile:
Configuration over logic. At a new scale enabled by AI.
no but thank you. i will stick to using npm's is-odd and is-even packages
Next put them in a tree for faster lookups.
With a tree you'll be limited by the RAM. I advise to use a database.
Map reduce, cluster geo-failover and CDN caching for optimized coldstarts in case you have to bring it up from scratch in a new datacenter. Bio for contact info, hourly billing. I have helped many startups reach their first 100MARR. Buy my audiobook.
Don't forget to create an index on the composite key [number, is_even]!
Don't reinvent the wheel, put them in a SQLite table, and let the sql engine create the tree for you!
Damn it - my code got leaked!
Silly. Don't waste your time on problems other people have already solved! Use JS and "npm install odd_or_even".
This is good stuff
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
Lol
I would also like to praise the visionary genius Ross van der Gussom, without whom this wonderful achievement in software engineering would not have been possible!
Is he the one married to the singer, Adele Dazeem?
No, this one is the creator of the mighty programming language Mython.
"The executable is around 2 MB"- Every dotnet programmer: "Those are rookie numbers!"
I see why now npm's is-odd has millions of downloads
Why not optimize this? Create a lookup table, a 2^64 large array of bools, and just check the n-th element to see if it's odd or even?
Many gigabytes saved!
/s
[flagged]
if(n&1)
else
You can do it even faster with the if statements:
gcc -std=c11 -Wall -Wextra -O2 -o check_digit check_digit.c./check_digit 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
You can do it even even faster by replacing your if statements (works because the ascii values end in the digit they represent):
You inspired me this joyful rewrite:
probably easier in bash:
A bit limited, but you can scale it upScaled up:
If $1 had a trailing non-digit, or was empty, that would indeed be an odd situation!I'm disappointed, it's not in rust. :-)
Still hoping for the C++ template version. Don’t pay for what you don’t use!