Exposing Dependencies: Getting Worse Before Getting Better
As software developers, our job is managing dependencies. We have to carefully construct balanced graphs of classes, objects, libraries, services, and more. Code needs to be encapsulated, reusable, and extensible, all while creating end-user value. We use techniques like composition, inheritance, and dependency injection to separate this from that, knowing that this or that could change in unknown ways at any moment.
We don't always get to create the perfect system from the ground up though. Sometimes, we work on software developed long ago that could not anticipate today's needs. It hurts a little more when we see our name on all those commit messages. Regardless, we need to make the necessary adjustments for our system to meet the needs of today and tomorrow. To make code more flexible and adaptable, we often have to break it up into more singular and discrete responsibilities. We have to pull out and encapsulate the pieces that are at different levels of abstraction. We have to control those dependencies, so they don't end up controlling us.
Say we have the following code from an application that does analysis on text:
There's a lot going on in this one method. We should know scary things are happening when we have a method that takes no parameters and returns nothing. We are directly interacting with both a GUI and a database. We are grabbing those instances from global state. We're embedding an SQL query in the middle of this code. We are reaching down through multiple levels to get the date range. In just a few lines, we have quite the bird's nest.
To improve this code, we need to expose the implicit dependencies and make them explicit. The series of refactoring steps results in a series of progressively more explicit methods:
This refactoring is not complete, but just from this, we can see important changes to the code. In the first step, we take the implicit dependencies of the GUI and the database and inject them as parameters. A method that used to have no parameters now has two parameters. In a way, this is objectively worse because we have more parameters now, not less. Calling code will now need to get those dependencies separately and pass them into the new method. The new parameters do not represent new dependencies though. The dependencies were already there, and we are just making them explicit.
In step 2, we address the dependency of the output. Instead of outputting to the
Step 3 goes back to the input dependencies. Since we no longer output to the
The resulting improvement is greater than the sum of the improvements of the intermediate steps. Each step added burdens with very little upside, while the original code abstracted getting and showing the word counts. This abstraction was illusory: it merely provided convenience, not encapsulation, reusability, testability, or safety. But what if we had stopped at step 2 or step 1? Calling code would be more verbose, and would need to do more work gathering all these dependencies.
In a large, complex restructuring of code, we may make a dozen changes like these, but we can only move one step at a time. We will have a hard time justifying putting a dozen step 1's in the code, potentially forever if we can't make it back. We must be sure to keep the end goal front-and-center. It must be visible, not just to us but to our whole team. Without seeing or understanding the goal, they may use those step 2's in contorted ways that make it more difficult to get to step 3. By keeping focus, things can get better after they get worse.
We don't always get to create the perfect system from the ground up though. Sometimes, we work on software developed long ago that could not anticipate today's needs. It hurts a little more when we see our name on all those commit messages. Regardless, we need to make the necessary adjustments for our system to meet the needs of today and tomorrow. To make code more flexible and adaptable, we often have to break it up into more singular and discrete responsibilities. We have to pull out and encapsulate the pieces that are at different levels of abstraction. We have to control those dependencies, so they don't end up controlling us.
Say we have the following code from an application that does analysis on text:
/**
* Show words and their counts imported between the dates specified.
*/
public void showWordCountsInRange() {
JWordViewer wordViewer = App.getGUI().getActiveWordViewer();
Database db = App.getDB();
ResultSet rs = db.query("SELECT word, COUNT(word) AS word_count FROM Words WHERE " +
"import_date >= " + DBUtils.toSQLString(wordViewer.getStartField().getDate()) + " AND " +
"import_date < " + DBUtils.toSQLString(wordViewer.getEndField().getDate()) + " " +
"GROUP BY word");
Map words = new HashMap<>();
for(Result result : resultSet) {
words.add(result.getString("word"), result.getLong("word_count"));
}
wordViewer.setWords(words);
}
There's a lot going on in this one method. We should know scary things are happening when we have a method that takes no parameters and returns nothing. We are directly interacting with both a GUI and a database. We are grabbing those instances from global state. We're embedding an SQL query in the middle of this code. We are reaching down through multiple levels to get the date range. In just a few lines, we have quite the bird's nest.
To improve this code, we need to expose the implicit dependencies and make them explicit. The series of refactoring steps results in a series of progressively more explicit methods:
/**
* Show words and their counts imported between the dates specified.
*
* Step 0: The original method signature, left deprecated to allow
* client code to still compile
* @deprecated Use {@link #showWordCountsInRange(JWordViewer, Database)}
*/
@Deprecated
public void showWordCountsInRange() {
showWordCountsInRange(App.getGUI().getActiveWordViewer(), App.getDB());
}
/**
* Show words and their counts imported between the dates specified.
*
* Step 1: Directly inject the global dependencies as method parameters
* @see #getWordCountsInRange(JWordViewer, Database)
*/
public void showWordCountsInRange(JWordViewer wordViewer, Database db) {
wordViewer.setWords(getWordCountsInRange(wordViewer, db));
}
/**
* Get words and their counts imported between the dates specified.
*
* Step 2: Return a value instead of setting on a parameter
* @deprecated Use getWordCountsInRange(Date, Date, Database)
*/
@Deprecated
public Map<String, Long> getWordCountsInRange(JWordViewer wordViewer, Database db) {
return getWordCountsInRange(
wordViewer.getStartField().getDate(), wordViewer.getEndField().getDate(),
db);
}
/**
* Get words and their counts imported between the dates specified.
*
* Step 3: Pass the dates rather than the GUI element holding them
*/
public Map<String, Long> getWordCountsInRange(Date startDate, Date endDate, Database db) {
ResultSet rs = db.query("SELECT word, COUNT(word) AS word_count FROM Words WHERE " +
"import_date >= " + DBUtils.toSQLString(startDate) + " AND " +
"import_date < " + DBUtils.toSQLString(endDate) + " " +
"GROUP BY word");
Map words = new HashMap<>();
for(Result result : resultSet) {
words.add(result.getString("word"), result.getLong("word_count"));
}
return words;
}
This refactoring is not complete, but just from this, we can see important changes to the code. In the first step, we take the implicit dependencies of the GUI and the database and inject them as parameters. A method that used to have no parameters now has two parameters. In a way, this is objectively worse because we have more parameters now, not less. Calling code will now need to get those dependencies separately and pass them into the new method. The new parameters do not represent new dependencies though. The dependencies were already there, and we are just making them explicit.
In step 2, we address the dependency of the output. Instead of outputting to the
JWordViewer
parameter, we return the mapping. For our calling code, this is also objectively worse. At that point we need to add the word mapping to the GUI element separately. That step is explicit and no longer magical.But we can also reuse this method to output to a text interface, or to a file, or to anything else. By returning a word-to-count mapping, we allow for that reuse, which also simplifies testing.
Step 3 goes back to the input dependencies. Since we no longer output to the
JWordViewer
, we don't need to take it as a parameter either. We can just take the two dates and go from there. We are free from any GUI. We can get dates from anywhere and return data to anywhere. We've just made the dependency graph simpler without sacrificing any behavior. This change is objectively better.The resulting improvement is greater than the sum of the improvements of the intermediate steps. Each step added burdens with very little upside, while the original code abstracted getting and showing the word counts. This abstraction was illusory: it merely provided convenience, not encapsulation, reusability, testability, or safety. But what if we had stopped at step 2 or step 1? Calling code would be more verbose, and would need to do more work gathering all these dependencies.
In a large, complex restructuring of code, we may make a dozen changes like these, but we can only move one step at a time. We will have a hard time justifying putting a dozen step 1's in the code, potentially forever if we can't make it back. We must be sure to keep the end goal front-and-center. It must be visible, not just to us but to our whole team. Without seeing or understanding the goal, they may use those step 2's in contorted ways that make it more difficult to get to step 3. By keeping focus, things can get better after they get worse.
Comments
Post a Comment