The Three Byte Fix

https://breckyunits.com/ckmeans.html

59 points by breck on 2024-04-29 | 5 comments

Automated Summary

In 2022, users reported that map charts of a company were slow. A JavaScript port of a k-means clustering algorithm from a 2011 paper was identified as the cause of the slowdown. The function, 'ckmeans' from the Simple Statistics package, was found to run in O(n²) time instead of the claimed O(nlog(n)). A 3-character typo fix in the JavaScript port, which resolved an incorrect array length check in a for-loop, significantly improved the function's performance. This typo fix brought the JavaScript implementation in line with the original C++ version's nlog(n) performance. After submitting the fix, the Simple Statistics creator reviewed and merged it, benefitting all users of the 'ckmeans' function.

Comments

lilyball on 2024-04-30

> I was amazed by how fast he found that bug in code he had never seen before. I'm not sure if he read carefully through the original paper or came up with the clever debug strategy of "since this is a port, let's compare to the original, looking for typos, with a particular focus on the loops".

The latter is exactly what I would have done. If the algorithm did not match the expected complexity, the most likely reason is a bug in the implementation, and the quickest way to find that bug is to compare it with a known-good implementation, such as the one that the code was ported from.

mucle6 on 2024-04-29

Does anyone have tips for how to navigate new code bases faster? I feel like I would have given up reading through the code before I found the issue discussed in the article

I didn't read through the code but maybe the library was small and this was just a case of checking the difference between c++ and js by line for 1k lines or something

markerz on 2024-04-30

There’s plenty of strategies.

In this case, if you can see two code based behaviors have diverged, just run both through a step by step debugger and see where they diverge.

Or if you run both through a profiler, you can see what segment of code is taking a crazy amount of time in comparison.

In general, narrowing down what you’re looking at is the goal. In this case, a big hint is that the code should be O(nlogn) but is actually O(n^2) which is a huge hint that we’re in a nested loop and the break condition isn’t getting triggered properly. I would have looked closely at the loops because of this.

The code itself is ~400 lines long minus a bunch of comments. This feels very approachable to skim through in an hour.

I also love creating new variables and naming them what I’m learning. Often times naming things will make bugs very obvious.

euroderf on 2024-04-30

> I also love creating new variables and naming them what I’m learning. Often times naming things will make bugs very obvious.

Agreed. There should be a formalish name for this technique/style.

jeffrallen on 2024-04-30

Neves' Law strikes again.