Untangling Nested Code



There is code out there that is ugly. I mean really ugly. I know, can you believe it? But it's true. One such example of really ugly code is code that has lots of deeply nested calls. The tests are trying to test a dozen different methods in every case. The methods look so simple with just a few lines and a few calls. And then those methods called have a few lines and a few calls. And then again, a dozen or more different calls deep.

We want to use private helper methods to make our code clearer and cleaner. Calling methods is also largely what programming is about. But sometimes we overdo it. Deep call hierarchies are difficult to follow and they can mask what the code is really doing. Each call means that we have to look all over a class to see what actually happens. This makes debugging hard. This makes testing even harder.

Say we have the following code:


class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        for (State state : nation.states()) {
            if (!state.isParticipating(year)) {
                continue;
            }

            loadStateData(state, year, reader);
        }
    }

    private void loadStateData(State state, int year, DataReader reader) {
        for (County county : state.counties()) {
            if (county.hasDataFor(year)) {
                loadCountyData(county, year, reader);
            }
        }
    }

    private void loadCountyData(County county, int year, DataReader reader) {
        for (DataBlock dataBlock : county.dataBlocksFor(year)) {
            if (dataBlock.dataPoints().size() >= 100) {
                for (DataPoint dataPoint : dataBlock.dataPoints()) {
                    reader.read(dataPoint);
                }
            }
        }
    }
}

We have three methods that are called nested from top to bottom. When we use loadNationalData, we must use loadStateData, and in turn we must use loadCountyData. This make sense when we want to call loadNationalData from code, but we don't necessarily want to do that from tests. We would like to test these methods separately to make sure each works individually without testing everything together. How can we un-nest these method calls?

Let's first look at what the loadCountyData method is doing: looping over data blocks for a given year in a given county, filtering out data blocks that do not have a minimum number of data points, then feeding those data points into the data reader. This method is doing more than one thing. We can split up some of the things into separate blocks of code:


    private void loadCountyData(County county, int year, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (dataBlock.dataPoints().size() >= 100) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        for (DataPoint dataPoint : dataPoints) {
            reader.read(dataPoint);
        }
    }

Now we have a top chunk that is filtering out all the points from data blocks that do not meet the minimum. We've also extracted the code for getting data blocks from the county to a local variable as part of that top chunk. We have a bottom chunk that is dispatching those data points to the data reader. We can extract two methods along those lines. This temporarily makes the nesting worse in this class:


    private void loadCountyData(County county, int year, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
        Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);

        writeDataPoints(dataPoints, reader);
    }

    private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (dataBlock.dataPoints().size() >= 100) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        return dataPoints;
    }

    private void writeDataPoints(Iterable<DataPoint> dataPoints, DataReader reader) {
        for (DataPoint dataPoint : dataPoints) {
            reader.read(dataPoint);
        }
    }

We can now look at the methods we extracted. The writeDataPoints is all about sending data points to the data reader. The DataReader class already has a method for reading an individual data point. We could move the writeDataPoints method to the DataReader class and rename it to read to match the other read method. We eliminate one of the methods we just created this way. We still have filterDataBlocks. Right now, we can't be sure how that might evolve until we see a little more of this refactoring.



    private void loadCountyData(County county, int year, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
        Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);

        reader.read(dataPoints);
    }

    private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (dataBlock.dataPoints().size() >= 100) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        return dataPoints;
    }

Now we can do something similar at the next level up: the loadStateData method. It has a similar filter-and-call structure to loadCountyData. We can divide the two responsibilities:


    private void loadStateData(State state, int year, DataReader reader) {
        Iterable<County> counties = state.counties();
        List<County> countiesWithData = new ArrayList<>();

        for (County county : counties) {
            if (county.hasDataFor(year)) {
                countiesWithData.add(county);
            }
        }

        for (County county : countiesWithData) {
            loadCountyData(county, year, reader);
        }
    }

We can give a name to the filtering part and extract it:


    private void loadStateData(State state, int year, DataReader reader) {
        Iterable<County> counties = state.counties();
        Iterable<County> countiesWithData = filterCountiesWithData(counties, year);

        for (County county : countiesWithData) {
            loadCountyData(county, year, reader);
        }
    }

    private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
        List<County> countiesWithData = new ArrayList<>();

        for (County county : counties) {
            if (county.hasDataFor(year)) {
                countiesWithData.add(county);
            }
        }

        return countiesWithData;
    }

There seems to be a pattern developing here. We need to finish these extractions first to see it fully before we jump into creating abstractions. We can also notice that loadStateData is very similar to loadCountyData, but at one level of granularity higher. Nested calls will often be like this because each level of nesting is meant to handle one level smaller of the problem.

Finally, we look at loadNationalData, and it again has a similar filter-and-call structure. We separate the two responsibilities again, and extract out the filtering (done in one step here for brevity):


    void loadNationalData(Nation nation, int year, DataReader reader) {
        Iterable<State> states = nation.states();
        Iterable<State> participatingStates = filterParticipatingStates(states, year);

        for (State state : participatingStates) {
            loadStateData(state, year, reader);
        }
    }

    private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
        List<State> participatingStates = new ArrayList<>();

        for (State state : states) {
            if (state.isParticipating(year)) {
                participatingStates.add(state);
            }
        }

        return participatingStates;
    }

We now have a class with even more nested methods to call. We've separated the two responsibilities out of the original methods into separate methods, but that has made the original problem worse. The whole class looks like this now:


class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        Iterable<State> states = nation.states();
        Iterable<State> participatingStates = filterParticipatingStates(states, year);

        for (State state : participatingStates) {
            loadStateData(state, year, reader);
        }
    }

    private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
        List<State> participatingStates = new ArrayList<>();

        for (State state : states) {
            if (state.isParticipating(year)) {
                participatingStates.add(state);
            }
        }

        return participatingStates;
    }

    private void loadStateData(State state, int year, DataReader reader) {
        Iterable<County> counties = state.counties();
        Iterable<County> countiesWithData = filterCountiesWithData(counties, year);

        for (County county : countiesWithData) {
            loadCountyData(county, year, reader);
        }
    }

    private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
        List<County> countiesWithData = new ArrayList<>();

        for (County county : counties) {
            if (county.hasDataFor(year)) {
                countiesWithData.add(county);
            }
        }

        return countiesWithData;
    }

    private void loadCountyData(County county, int year, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
        Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);

        reader.read(dataPoints);
    }

    private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (dataBlock.dataPoints().size() >= 100) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        return dataPoints;
    }
}

Looking at this code in total, we can notice a few oddities that did not stick out as much before. We are passing parameters down through nested calls that don't need them except to make other nested calls, namely the DataReader reader and int year parameters. We also notice that the filter methods are actually still doing two things. They are filtering the values in a given collection, but they are also (flat) mapping them into a new collection.

Since we still have methods with multiple responsibilities, we should separate those responsibilities out and name them. We can pull out the filtering predicates into their own methods. The result would be:


class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        Iterable<State> states = nation.states();
        Iterable<State> participatingStates = filterParticipatingStates(states);

        for (State state : participatingStates) {
            loadStateData(state, year, reader);
        }
    }

    private Iterable<State> filterParticipatingStates(Iterable<State> states, int year) {
        List<State> participatingStates = new ArrayList<>();

        for (State state : states) {
            if (isStateAcceptable(state, year)) {
                participatingStates.add(state);
            }
        }

        return participatingStates;
    }

    private boolean isStateAcceptable(State state, int year) {
        return state.isParticipating(year);
    }

    private void loadStateData(State state, int year, DataReader reader) {
        Iterable<County> counties = state.counties();
        Iterable<County> countiesWithData = filterCountiesWithData(counties, year);

        for (County county : countiesWithData) {
            loadCountyData(county, year, reader);
        }
    }

    private Iterable<County> filterCountiesWithData(Iterable<County> counties, int year) {
        List<County> countiesWithData = new ArrayList<>();

        for (County county : counties) {
            if (isCountyAcceptable(county, year)) {
                countiesWithData.add(county);
            }
        }

        return countiesWithData;
    }

    private boolean isCountyAcceptable(County county, int year) {
        return county.hasDataFor(year);
    }

    private void loadCountyData(County county, int year, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(year);
        Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks);

        reader.read(dataPoints);
    }

    private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks) {
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (isDataBlockAcceptable(dataBlock)) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        return dataPoints;
    }

    private boolean isDataBlockAcceptable(DataBlock dataBlock) {
        return dataBlock.dataPoints().size() >= 100;
    }
}

We've made the problem even worse than before. We have methods calling methods calling methods three deep in every direction. We've separated out all of the responsibilities, and now we can make the changes that will turn the tide.

We have three sets of three methods that each have a repeating prefix (load, filter) or repeating suffix (acceptable). When we see this happen, we know that there is a missing abstraction or two that we can create. We can start with the predicate methods (is*Acceptable). These methods might make more sense as a separate object.


class DataFilter {
    private static final int MINIMUM_DATA_POINTS = 100;

    private final int year;

    DataFilter(int year) {
        this.year = year;
    }

    int year() {
        return year;
    }

    boolean isStateAcceptable(State state) {
        return state.isParticipating(year);
    }

    boolean isCountyAcceptable(County county) {
        return county.hasDataFor(year);
    }

    boolean isDataBlockAcceptable(DataBlock dataBlock) {
        return dataBlock.dataPoints().size() >= MINIMUM_DATA_POINTS;
    }
}

We've copied out the relevant predicate methods into a separate object. The only additional changes were to extract the minimum number of data points to a constant, and to make the year a property passed in at construction. We moved the year for two reasons. First, it must be consistent between calls. By storing it, we guarantee that. Second, we reduced the number of parameters to two of the methods to just the objects they are really interested in. This object is now isolated and separately testable. Before, these methods would need to be tested through the loadNationalData method. Now they can be called independently with a much simpler and more direct test setup. We weave this object through our code:


class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        DataFilter dataFilter = new DataFilter(year);
        Iterable<State> states = nation.states();
        Iterable<State> participatingStates = filterParticipatingStates(states, dataFilter);

        for (State state : participatingStates) {
            loadStateData(state, dataFilter, reader);
        }
    }

    private Iterable<State> filterParticipatingStates(Iterable<State> states, DataFilter dataFilter) {
        List<State> participatingStates = new ArrayList<>();

        for (State state : states) {
            if (dataFilter.isStateAcceptable(state)) {
                participatingStates.add(state);
            }
        }

        return participatingStates;
    }

    private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
        Iterable<County> counties = state.counties();
        Iterable<County> countiesWithData = filterCountiesWithData(counties, dataFilter);

        for (County county : countiesWithData) {
            loadCountyData(county, dataFilter, reader);
        }
    }

    private Iterable<County> filterCountiesWithData(Iterable<County> counties, DataFilter dataFilter) {
        List<County> countiesWithData = new ArrayList<>();

        for (County county : counties) {
            if (dataFilter.isCountyAcceptable(county)) {
                countiesWithData.add(county);
            }
        }

        return countiesWithData;
    }

    private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
        Iterable<DataBlock> dataBlocks = county.dataBlocksFor(dataFilter.year());
        Iterable<DataPoint> dataPoints = filterDataBlocks(dataBlocks, dataFilter);

        reader.read(dataPoints);
    }

    private Iterable<DataPoint> filterDataBlocks(Iterable<DataBlock> dataBlocks, DataFilter dataFilter) {
        List<DataPoint> dataPoints = new ArrayList<>();

        for (DataBlock dataBlock : dataBlocks) {
            if (dataFilter.isDataBlockAcceptable(dataBlock)) {
                dataPoints.addAll(dataBlock.dataPoints());
            }
        }

        return dataPoints;
    }
}

Since we've encapsulated much of what varies, we are left with duplication. The filter-and-mapping done in the filter methods can be more explicit by converting them to use standard Stream filters and maps from Java 8.

class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        final DataFilter dataFilter = new DataFilter(year);
        final Stream states = streamOf(nation.states());
        final Stream participatingStates = filterParticipatingStates(states, dataFilter);

        participatingStates.forEachOrdered((state) -> loadStateData(state, dataFilter, reader));
    }

    private Stream filterParticipatingStates(final Stream states, DataFilter dataFilter) {
        return states.filter(dataFilter::isStateAcceptable);
    }

    private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
        final Stream counties = streamOf(state.counties());
        final Stream countiesWithData = filterCountiesWithData(counties, dataFilter);

        countiesWithData.forEachOrdered((county) -> loadCountyData(county, dataFilter, reader));
    }

    private Stream filterCountiesWithData(final Stream counties, DataFilter dataFilter) {
        return counties.filter(dataFilter::isCountyAcceptable);
    }

    private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
        final Stream dataBlocks = streamOf(county.dataBlocksFor(dataFilter.year()));
        final Stream dataPoints = filterDataBlocks(dataBlocks, dataFilter);

        reader.read(dataPoints);
    }

    private Stream filterDataBlocks(final Stream dataBlocks, DataFilter dataFilter) {
        return dataBlocks
            .filter(dataFilter::isDataBlockAcceptable)
            .flatMap((dataBlock) -> streamOf(dataBlock.dataPoints()));
    }

    private static  Stream streamOf(final Iterable iterable) {
        return StreamSupport.stream(iterable.spliterator(), false);
    }
}


Almost everywhere we had an Iterable, we've replaced it with a Stream. We've added a private static helper method to convert the Iterables we do have into streams. We also added a new method to the DataReader to take a Stream:

 public void read(final Stream dataPoints) {
  dataPoints.forEachOrdered(this::read);
 }


Since the filter methods are much simpler, we don't need them. We can inline the filtering code into the calling methods:

class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        final DataFilter dataFilter = new DataFilter(year);
        final Stream states = streamOf(nation.states());

        states
            .filter(dataFilter::isStateAcceptable)
            .forEachOrdered((state) -> loadStateData(state, dataFilter, reader));
    }

    private void loadStateData(State state, DataFilter dataFilter, DataReader reader) {
        final Stream counties = streamOf(state.counties());

        counties
            .filter(dataFilter::isCountyAcceptable)
            .forEachOrdered((county) -> loadCountyData(county, dataFilter, reader));
    }

    private void loadCountyData(County county, DataFilter dataFilter, DataReader reader) {
        final Stream dataBlocks = streamOf(county.dataBlocksFor(dataFilter.year()));

        reader.read(dataBlocks
            .filter(dataFilter::isDataBlockAcceptable)
            .flatMap((dataBlock) -> streamOf(dataBlock.dataPoints())));
    }

    private static  Stream streamOf(final Iterable iterable) {
        return StreamSupport.stream(iterable.spliterator(), false);
    }
}


We are finally getting to much shorter code that is much easier to reason about. The number of nested calls is about where it was when we started. Much of the logic, though, has been moved out to more testable places. We can still tidy this up more, though. Each load method is now very simple, just flat mapping and filtering before calling the next method in line. We can make these steps more explicit via Stream methods as well:

class DataLoader {
    void loadNationalData(Nation nation, int year, DataReader reader) {
        final DataFilter dataFilter = new DataFilter(year);
        final Stream states = streamOf(nation.states());

        reader.read(states
            .filter(dataFilter::isStateAcceptable)
            .flatMap((state) -> streamOf(state.counties()))
            .filter(dataFilter::isCountyAcceptable)
            .flatMap((county) -> streamOf(county.dataBlocksFor(dataFilter.year())))
            .filter(dataFilter::isDataBlockAcceptable)
            .flatMap((dataBlock) -> streamOf(dataBlock.dataPoints())));
    }

    private static  Stream streamOf(final Iterable iterable) {
        return StreamSupport.stream(iterable.spliterator(), false);
    }
}


We've finally eliminated all of the nested calls. This method is cleaner, and much simpler to understand than the original code. We can debug and test it more directly.

You can see the full progression of this code at https://bitbucket.org/cbojar-blog/nested-tangle


This post is based on this StackExchange question.

Comments

Popular posts from this blog

The Timeline of Errors

Magic Numbers, Semantics, and Compiler Errors

Assuming Debugging