Learning to Refactor Other People's Code Therapeutically

StackExchange's CodeReview site provides a wealth of code samples representing the best, and sometimes the worst, of code that can be produced. It has snippets volunteered from all types of code bases in nearly every language. One could get lost for hours just reading how other people, both posters and answerers, choose to solve problems. It is fun to dive into some of the code on this site and work through refactoring it to see what comes out the other side.

One such example I happened upon recently was this little gem. To summarize, it is a function that builds URLs from parts and puts those URLs into a dictionary object. It is written in Javascript for Node.JS. It uses a recursive function to permute the parts together to use as the dictionary keys, and for building those URLs.

This is one of those pieces of code that fails the Squint Test badly. There is a lot of deep nesting, with fors inside of ifs inside of fors. Given the indentation, the bottom half of the code doesn't even look like it is where it belongs. There's not a lot of spacing, either horizontal or vertical. There's that recursive function. I have no clue what that does or how it works. Naming is… not helpful.

From the description, we know one thing for sure: it works. This is the most important thing to know about this code. Once I know that, I can start to go to town on it. (Incidentally, all code posted on CodeReview.SE is supposed to be working code.)

First, I needed to run it to see it work. I copied the code into files on my local machine. I re-indented the whole thing, which did indeed show that the original spacing was deceptive. Once I got the express.js dependency installed and fired up node, I saw that it did work, as promised. Sort of? I guess it worked like the author intended. The actual code didn't put the URL dictionary into the HTTP response, but instead dumped it to the console. Odd, but I was probably not seeing the whole app, so there may have been something else that happened with it once it was created. It was nice that it dumped its output to the console, though (as JSON to boot!). I could literally copy the entire thing and paste it into a test to guard against regressions.

For a test, I decided to just write a quick-and-dirty checker that iterated through the dictionary and compared the strings against the expected output (aka that JSON that got dumped out). Rather than having to wrangle a testing framework, this gave me immediate results without the hassle.

We all know the programmer's favorite thing is to delete code, so that's where I started. I saw a variable declared that I didn't seem to see anywhere else. A quick vim search confirmed this, so that could go. Run the test, all good. Progress already.

var patNum = 1;

Since there were unused variables, there were very likely more variable issues. A cursory glance revealed scoping issues quickly. (Even though JavaScript only has function-level scoping, conceptually limiting scope can help with comprehension.) The num variable was a looping variable declared way outside the loop it was used it. In addition, it was the loop variable for a for-in loop… on an array. In Javascript, you can do a for-in loop on an array, but it is much better to just do a traditional loop, as it avoids checking hasOwnProperty, etc. Convert the loop, bring the variable into the loop, and done. Tests pass, more progress.

Because of the way Javascript is, a for-in loop should always have an accompanying check for hasOwnProperty. But because of the structure of the for-then-if requirement, the code get spaceship-y really quickly, especially when there are multiple nested for-in loops. A way to reduce the indentation while keeping the check in place is to formulate it as a guard(-esque) clause. To do this, negate the condition and put a continue; as the condition body. Then take everything else out of that condition. One level of indentation removed. Do that for each of the nested loops. Tests pass. Less tabs.

There were several places in the code with string concatenation in the form of thing = thing + "more stuff". Javascript has the += operator for string concatenation, so I just went through and replaced all the things. Tests pass, next step. There was also a pair of sequential ifs that checked the same condition, so I could reorder those into an if-else. Tests pass, next.

I do a lot more of these little changes to make the code more readable and manageable: temporary local variables, removing an unreachable line (probably a bug), joining a declaration and definition onto the same line, replacing for-ins with fors on arrays, and discovering ordering bugs. All the while, I know I have this recursive function looming over me. I don't understand it, but I'm going to have to deal with it.

This function takes nine parameters (eight after the unreachable line was removed). The first two parameters are just for tracking the recursion. The function doesn't return anything, but instead stores its results in a global variable. It calls itself recursively twice. This thing is a little hairy.

I needed to break the problem down a bit. Looking over it, I could see that there were two parts: the base case where we put together the URL, and the recursive parts. The base case was all in the function, so I did an extract function. I took anything that involved building the URL and dumped it into a separate function, buildURL. That function still took seven of the original parameters, but that was less than nine. (After removing the dead code, that seven became six.) Once that was separate, there was another variable that was unnecessary, so that got taken out. Tests a still passing. Whew!

Now I can deal with that recursive algorithm… nope, maybe later.

So now I have one function that takes eight parameters, and that calls a function with six parameters. That's a lot of parameters. I looked at the original call to the function below. One of the parameters caught my eye: metfield. An empty string is passed in, and it's no longer mentioned in the original function, just passed down to the buildURL. Since nothing returned, this value was not bubbling up. Putting in a console.log confirmed that this code is needlessly passing around an empty string. It was really just being used as a local variable when building the URL. Take that out and make it local to buildURL, and the tests all still pass.

If there's one useless parameter, maybe there are more. For the original function, everything else appeared to be used, but for the buildURL function, I was passing in one of those parameters used to track the recursion. The function was recursing over a two-dimensional array, so the tracked parameter was testing for the end of one of those dimensions. Well, that should always be the .length of that dimension, so pass that in instead. Still green. But maybe there was more. Looking in buildURL, I could see that it was being used as the limit in a for-loop concatenating an array into a string. That array is the same length as the original dimension, so the whole parameter was redundant, and therefore unnecessary. Away it went. Concatenating an array into a string already has a name, .join, so away with the loop entirely. Still green, still good. And buildURL was now down to four parameters.

I didn't think I could get it down to less than four parameters, but something still bothered me. Well, a lot of things still bothered me, but two things in particular bothered me the most at this point. First, buildURL was still mucking around in that global variable. Second, I was taking an array parameter, joining it, then using it as the key for that global. I felt like with a couple of steps, I could clean both up a bit. Taking the array parameter and joining it inside of the buildURL function felt like a tell-don't-ask violation: give me an array so I can ask for all its parts, then put them together in a string. What I really wanted was just the string. The recursive function above was building this array, so it was the perfect place to put this responsibility. I could join the string there then pass it in instead of the array. Done, green, great.

Now that the outer function had the key name, it could also handle the assignment to the global variable. It still had to be passed in to buildURL since it was also part of the data, but my buildURL function could now return its result! Now that I had a grasp of that global, it was time to take it out. It was only being written in the recursive function, and only being returned from the whole module. We can just pass it into the recursive function. Now, instead of dealing with the global, I can deal with a parameter. Because it is a reference, I can pass it down, and changes bubble back up. I just need to return it at the end. Done, still green. Now, I can remove the global, and create a local to the module function, and return that instead. Done. And we're still green!

Now time to actually deal with that recursive algorithm. I still don't entirely grok what it does. OK. Time to look at that output I copy-pasted earlier. (Yes, this is actually the first time I looked at its details.) I said earlier that this function does a permutation, but I didn't know that until I looked at the output. Once I saw that it was taking the values from some arrays and permuting them together, I realized that there is a much simpler algorithm for this. I've actually already written this algorithm as a code exercise for myself. I didn't need everything I did there, but just enough to handle the current case. I wrote a simple permutation function, permute, first. Then I replaced the code in the original recursive function. Green!

My permute didn't require nearly as many parameters as the original recursive function, and so was much simpler. It also meant that I could get rid of three parameters of the original function that were tracking state down the stack. Now that function only took five parameters, including our optional parameter that replaced the global variable.

I did some more clean up on the code to improve readability and address some bugs. I also branched from the final version to make a more functional version. You can see that all in my code review repo for this code, as well as all the steps I took to get here. I also posted an official answer to the original post.

Comments

Popular posts from this blog

The Timeline of Errors

Magic Numbers, Semantics, and Compiler Errors

Assuming Debugging