Untangling Nested Code
There is code out there that is ugly. I mean really ugly. I know, can you believe it? But it's true. One such example of really ugly code is code that has lots of deeply nested calls. The tests are trying to test a dozen different methods in every case. The methods look so simple with just a few lines and a few calls. And then those methods called have a few lines and a few calls. And then again, a dozen or more different calls deep.
We want to use private helper methods to make our code clearer and cleaner. Calling methods is also largely what programming is about. But sometimes we overdo it. Deep call hierarchies are difficult to follow and they can mask what the code is really doing. Each call means that we have to look all over a class to see what actually happens. This makes debugging hard. This makes testing even harder.
Say we have the following code:
class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
for (State state : nation.states()) {
if (!state.isParticipating(year)) {
continue;
}
loadStateData(state, year, reader);
}
}
private void loadStateData(State state, int year, DataReader reader) {
for (County county : state.counties()) {
if (county.hasDataFor(year)) {
loadCountyData(county, year, reader);
}
}
}
private void loadCountyData(County county, int year, DataReader reader) {
for (DataBlock dataBlock : county.dataBlocksFor(year)) {
if (dataBlock.dataPoints().size() >= 100) {
for (DataPoint dataPoint : dataBlock.dataPoints()) {
reader.read(dataPoint);
}
}
}
}
}
We have three methods that are called nested from top to bottom. When we use
loadNationalData
, we must use loadStateData
, and in turn we must use loadCountyData
. This make sense when we want to call loadNationalData
from code, but we don't necessarily want to do that from tests. We would like to test these methods separately to make sure each works individually without testing everything together. How can we un-nest these method calls?Let's first look at what the
loadCountyData
method is doing: looping over data blocks for a given year in a given county, filtering out data blocks that do not have a minimum number of data points, then feeding those data points into the data reader. This method is doing more than one thing. We can split up some of the things into separate blocks of code:
private void loadCountyData(County county, int year, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (dataBlock.dataPoints().size() >= 100) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
for (DataPoint dataPoint : dataPoints) {
reader.read(dataPoint);
}
}
Now we have a top chunk that is filtering out all the points from data blocks that do not meet the minimum. We've also extracted the code for getting data blocks from the county to a local variable as part of that top chunk. We have a bottom chunk that is dispatching those data points to the data reader. We can extract two methods along those lines. This temporarily makes the nesting worse in this class:
private void loadCountyData(County county, int year, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);
writeDataPoints(dataPoints, reader);
}
private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (dataBlock.dataPoints().size() >= 100) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
return dataPoints;
}
private void writeDataPoints(Iterable<DataPoint> dataPoints, DataReader reader) {
for (DataPoint dataPoint : dataPoints) {
reader.read(dataPoint);
}
}
We can now look at the methods we extracted. The
writeDataPoints
is all about sending data points to the data reader. The DataReader
class already has a method for reading an individual data point. We could move the writeDataPoints
method to the DataReader
class and rename it to read
to match the other read
method. We eliminate one of the methods we just created this way. We still have filterDataBlocks
. Right now, we can't be sure how that might evolve until we see a little more of this refactoring.
private void loadCountyData(County county, int year, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);
reader.read(dataPoints);
}
private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (dataBlock.dataPoints().size() >= 100) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
return dataPoints;
}
Now we can do something similar at the next level up: the
loadStateData
method. It has a similar filter-and-call structure to loadCountyData
. We can divide the two responsibilities:
private void loadStateData(State state, int year, DataReader reader) {
Iterable<County> counties = state.counties();
List<County> countiesWithData = new ArrayList<>();
for (County county : counties) {
if (county.hasDataFor(year)) {
countiesWithData.add(county);
}
}
for (County county : countiesWithData) {
loadCountyData(county, year, reader);
}
}
We can give a name to the filtering part and extract it:
private void loadStateData(State state, int year, DataReader reader) {
Iterable<County> counties = state.counties();
Iterable<County> countiesWithData = filterCountiesWithData(counties, year);
for (County county : countiesWithData) {
loadCountyData(county, year, reader);
}
}
private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
List<County> countiesWithData = new ArrayList<>();
for (County county : counties) {
if (county.hasDataFor(year)) {
countiesWithData.add(county);
}
}
return countiesWithData;
}
There seems to be a pattern developing here. We need to finish these extractions first to see it fully before we jump into creating abstractions. We can also notice that
loadStateData
is very similar to loadCountyData
, but at one level of granularity higher. Nested calls will often be like this because each level of nesting is meant to handle one level smaller of the problem.Finally, we look at
loadNationalData
, and it again has a similar filter-and-call structure. We separate the two responsibilities again, and extract out the filtering (done in one step here for brevity):
void loadNationalData(Nation nation, int year, DataReader reader) {
Iterable<State> states = nation.states();
Iterable<State> participatingStates = filterParticipatingStates(states, year);
for (State state : participatingStates) {
loadStateData(state, year, reader);
}
}
private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
List<State> participatingStates = new ArrayList<>();
for (State state : states) {
if (state.isParticipating(year)) {
participatingStates.add(state);
}
}
return participatingStates;
}
We now have a class with even more nested methods to call. We've separated the two responsibilities out of the original methods into separate methods, but that has made the original problem worse. The whole class looks like this now:
class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
Iterable<State> states = nation.states();
Iterable<State> participatingStates = filterParticipatingStates(states, year);
for (State state : participatingStates) {
loadStateData(state, year, reader);
}
}
private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
List<State> participatingStates = new ArrayList<>();
for (State state : states) {
if (state.isParticipating(year)) {
participatingStates.add(state);
}
}
return participatingStates;
}
private void loadStateData(State state, int year, DataReader reader) {
Iterable<County> counties = state.counties();
Iterable<County> countiesWithData = filterCountiesWithData(counties, year);
for (County county : countiesWithData) {
loadCountyData(county, year, reader);
}
}
private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
List<County> countiesWithData = new ArrayList<>();
for (County county : counties) {
if (county.hasDataFor(year)) {
countiesWithData.add(county);
}
}
return countiesWithData;
}
private void loadCountyData(County county, int year, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);
reader.read(dataPoints);
}
private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (dataBlock.dataPoints().size() >= 100) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
return dataPoints;
}
}
Looking at this code in total, we can notice a few oddities that did not stick out as much before. We are passing parameters down through nested calls that don't need them except to make other nested calls, namely the
DataReader reader
and int year
parameters. We also notice that the filter methods are actually still doing two things. They are filtering the values in a given collection, but they are also (flat) mapping them into a new collection.Since we still have methods with multiple responsibilities, we should separate those responsibilities out and name them. We can pull out the filtering predicates into their own methods. The result would be:
class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
Iterable<State> states = nation.states();
Iterable<State> participatingStates = filterParticipatingStates(states);
for (State state : participatingStates) {
loadStateData(state, year, reader);
}
}
private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
List<State> participatingStates = new ArrayList<>();
for (State state : states) {
if (isStateAcceptable(state, year)) {
participatingStates.add(state);
}
}
return participatingStates;
}
private boolean isStateAcceptable(State state, int year) {
return state.isParticipating(year);
}
private void loadStateData(State state, int year, DataReader reader) {
Iterable<County> counties = state.counties();
Iterable<County> countiesWithData = filterCountiesWithData(counties, year);
for (County county : countiesWithData) {
loadCountyData(county, year, reader);
}
}
private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
List<County> countiesWithData = new ArrayList<>();
for (County county : counties) {
if (isCountyAcceptable(county, year)) {
countiesWithData.add(county);
}
}
return countiesWithData;
}
private boolean isCountyAcceptable(County county, int year) {
return county.hasDataFor(year);
}
private void loadCountyData(County county, int year, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);
reader.read(dataPoints);
}
private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (isDataBlockAcceptable(dataBlock)) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
return dataPoints;
}
private boolean isDataBlockAcceptable(DataBlock dataBlock) {
return dataBlock.dataPoints().size() >= 100;
}
}
We've made the problem even worse than before. We have methods calling methods calling methods three deep in every direction. We've separated out all of the responsibilities, and now we can make the changes that will turn the tide.
We have three sets of three methods that each have a repeating prefix (load, filter) or repeating suffix (acceptable). When we see this happen, we know that there is a missing abstraction or two that we can create. We can start with the predicate methods (is*Acceptable). These methods might make more sense as a separate object.
class DataFilter {
private static final int MINIMUM_DATA_POINTS = 100;
private final int year;
DataFilter(int year) {
this.year = year;
}
int year() {
return year;
}
boolean isStateAcceptable(State state) {
return state.isParticipating(year);
}
boolean isCountyAcceptable(County county) {
return county.hasDataFor(year);
}
boolean isDataBlockAcceptable(DataBlock dataBlock) {
return dataBlock.dataPoints().size() >= MINIMUM_DATA_POINTS;
}
}
We've copied out the relevant predicate methods into a separate object. The only additional changes were to extract the minimum number of data points to a constant, and to make the year a property passed in at construction. We moved the year for two reasons. First, it must be consistent between calls. By storing it, we guarantee that. Second, we reduced the number of parameters to two of the methods to just the objects they are really interested in. This object is now isolated and separately testable. Before, these methods would need to be tested through the
loadNationalData
method. Now they can be called independently with a much simpler and more direct test setup. We weave this object through our code:
class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
DataFilter dataFilter = new DataFilter(year);
Iterable<State> states = nation.states();
Iterable<State> participatingStates = filterParticipatingStates(states, dataFilter);
for (State state : participatingStates) {
loadStateData(state, dataFilter, reader);
}
}
private Iterable<State> filterParticipatingStates(Iterable<State> states, DataFilter dataFilter) {
List<State> participatingStates = new ArrayList<>();
for (State state : states) {
if (dataFilter.isStateAcceptable(state)) {
participatingStates.add(state);
}
}
return participatingStates;
}
private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
Iterable<County> counties = state.counties();
Iterable<County> countiesWithData = filterCountiesWithData(counties, dataFilter);
for (County county : countiesWithData) {
loadCountyData(county, dataFilter, reader);
}
}
private Iterable<County> filterCountiesWithData(Iterable<County> counties, DataFilter dataFilter) {
List<County> countiesWithData = new ArrayList<>();
for (County county : counties) {
if (dataFilter.isCountyAcceptable(county)) {
countiesWithData.add(county);
}
}
return countiesWithData;
}
private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
Iterable<DataBlock> dataBlocks = county.dataBlocksFor(dataFilter.year());
Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks, dataFilter);
reader.read(dataPoints);
}
private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks, DataFilter dataFilter) {
List<DataPoint> dataPoints = new ArrayList<>();
for (DataBlock dataBlock : dataBlocks) {
if (dataFilter.isDataBlockAcceptable(dataBlock)) {
dataPoints.addAll(dataBlock.dataPoints());
}
}
return dataPoints;
}
}
Since we've encapsulated much of what varies, we are left with duplication. The filter-and-mapping done in the
filter
methods can be more explicit by converting them to use standard Stream
filters and maps from Java 8.class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
final DataFilter dataFilter = new DataFilter(year);
final Stream states = streamOf(nation.states());
final Stream participatingStates = filterParticipatingStates(states, dataFilter);
participatingStates.forEachOrdered((state) -> loadStateData(state, dataFilter, reader));
}
private Stream filterParticipatingStates(final Stream states, DataFilter dataFilter) {
return states.filter(dataFilter::isStateAcceptable);
}
private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
final Stream counties = streamOf(state.counties());
final Stream countiesWithData = filterCountiesWithData(counties, dataFilter);
countiesWithData.forEachOrdered((county) -> loadCountyData(county, dataFilter, reader));
}
private Stream filterCountiesWithData(final Stream counties, DataFilter dataFilter) {
return counties.filter(dataFilter::isCountyAcceptable);
}
private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
final Stream dataBlocks = streamOf(county.dataBlocksFor(dataFilter.year()));
final Stream dataPoints = filterDataBlocks(dataBlocks, dataFilter);
reader.read(dataPoints);
}
private Stream filterDataBlocks(final Stream dataBlocks, DataFilter dataFilter) {
return dataBlocks
.filter(dataFilter::isDataBlockAcceptable)
.flatMap((dataBlock) -> streamOf(dataBlock.dataPoints()));
}
private static Stream streamOf(final Iterable iterable) {
return StreamSupport.stream(iterable.spliterator(), false);
}
}
Almost everywhere we had an
Iterable
, we've replaced it with a Stream
. We've added a private static helper method to convert the Iterable
s we do have into streams. We also added a new method to the DataReader
to take a Stream
: public void read(final Stream dataPoints) {
dataPoints.forEachOrdered(this::read);
}
Since the
filter
methods are much simpler, we don't need them. We can inline the filtering code into the calling methods:class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
final DataFilter dataFilter = new DataFilter(year);
final Stream states = streamOf(nation.states());
states
.filter(dataFilter::isStateAcceptable)
.forEachOrdered((state) -> loadStateData(state, dataFilter, reader));
}
private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
final Stream counties = streamOf(state.counties());
counties
.filter(dataFilter::isCountyAcceptable)
.forEachOrdered((county) -> loadCountyData(county, dataFilter, reader));
}
private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
final Stream dataBlocks = streamOf(county.dataBlocksFor(dataFilter.year()));
reader.read(dataBlocks
.filter(dataFilter::isDataBlockAcceptable)
.flatMap((dataBlock) -> streamOf(dataBlock.dataPoints())));
}
private static Stream streamOf(final Iterable iterable) {
return StreamSupport.stream(iterable.spliterator(), false);
}
}
We are finally getting to much shorter code that is much easier to reason about. The number of nested calls is about where it was when we started. Much of the logic, though, has been moved out to more testable places. We can still tidy this up more, though. Each
load
method is now very simple, just flat mapping and filtering before calling the next method in line. We can make these steps more explicit via Stream
methods as well:class DataLoader {
void loadNationalData(Nation nation, int year, DataReader reader) {
final DataFilter dataFilter = new DataFilter(year);
final Stream states = streamOf(nation.states());
reader.read(states
.filter(dataFilter::isStateAcceptable)
.flatMap((state) -> streamOf(state.counties()))
.filter(dataFilter::isCountyAcceptable)
.flatMap((county) -> streamOf(county.dataBlocksFor(dataFilter.year())))
.filter(dataFilter::isDataBlockAcceptable)
.flatMap((dataBlock) -> streamOf(dataBlock.dataPoints())));
}
private static Stream streamOf(final Iterable iterable) {
return StreamSupport.stream(iterable.spliterator(), false);
}
}
We've finally eliminated all of the nested calls. This method is cleaner, and much simpler to understand than the original code. We can debug and test it more directly.
You can see the full progression of this code at https://bitbucket.org/cbojar-blog/nested-tangle
This post is based on this StackExchange question.
Comments
Post a Comment