The Three Byte Fix
https://breckyunits.com/ckmeans.html
Automated Summary
In 2022, users reported that map charts of a company were slow. A JavaScript port of a k-means clustering algorithm from a 2011 paper was identified as the cause of the slowdown. The function, 'ckmeans' from the Simple Statistics package, was found to run in O(n²) time instead of the claimed O(nlog(n)). A 3-character typo fix in the JavaScript port, which resolved an incorrect array length check in a for-loop, significantly improved the function's performance. This typo fix brought the JavaScript implementation in line with the original C++ version's nlog(n) performance. After submitting the fix, the Simple Statistics creator reviewed and merged it, benefitting all users of the 'ckmeans' function.
> I was amazed by how fast he found that bug in code he had never seen before. I'm not sure if he read carefully through the original paper or came up with the clever debug strategy of "since this is a port, let's compare to the original, looking for typos, with a particular focus on the loops".
The latter is exactly what I would have done. If the algorithm did not match the expected complexity, the most likely reason is a bug in the implementation, and the quickest way to find that bug is to compare it with a known-good implementation, such as the one that the code was ported from.
Does anyone have tips for how to navigate new code bases faster? I feel like I would have given up reading through the code before I found the issue discussed in the article
I didn't read through the code but maybe the library was small and this was just a case of checking the difference between c++ and js by line for 1k lines or something
There’s plenty of strategies.
In this case, if you can see two code based behaviors have diverged, just run both through a step by step debugger and see where they diverge.
Or if you run both through a profiler, you can see what segment of code is taking a crazy amount of time in comparison.
In general, narrowing down what you’re looking at is the goal. In this case, a big hint is that the code should be O(nlogn) but is actually O(n^2) which is a huge hint that we’re in a nested loop and the break condition isn’t getting triggered properly. I would have looked closely at the loops because of this.
The code itself is ~400 lines long minus a bunch of comments. This feels very approachable to skim through in an hour.
I also love creating new variables and naming them what I’m learning. Often times naming things will make bugs very obvious.
> I also love creating new variables and naming them what I’m learning. Often times naming things will make bugs very obvious.
Agreed. There should be a formalish name for this technique/style.
Neves' Law strikes again.