Interfaces and the Need to Know Implementations

Many languages provide the ability to create explicit abstractions in the form of interfaces and abstract classes. Other languages don't provide this explicitly, but rely on the concept of duck typing for the same purpose. Either way, we use abstractions to hide the underlying complexity of implementations. Though we may use these abstractions, all too often we don't trust these abstractions.

Often I hear or see developers talking about how they can't use an abstraction because they need to know exactly what type of object they are dealing with. This is especially true when the runtime type is determined at runtime rather than at compile time, since we cannot know what the type might be. It's possible that it is some second- or third-party type that we've never seen, and therefore cannot know how it will work. How could we possibly trust some unknown code that is just stuck into ours?

But do we really need to know the runtime type? The biggest reason I've seen for knowing is to simplify debugging. If I need to chase a bug, I need to know where it is. If it's in some unknown subtype, how will I be able to track it down? This is a legitimate concern. We have to know where the bugs are to fix them. The other reason to know the runtime type is for managing performance. The difference between two implementations, where one has an O(n) look-up and the other a O(1) look-up, could be significant depending on the use. We would have to know which we are using to determine or guarantee a performance expectation.

So why can't we just trust the abstractions? When we are working with our own code, we feel we have to know what the real runtime type is. When it comes to the standard library, though, we have an implicit trust. When we use Arrays.asList() in Java, we only care that it returns a List. We can suspect that it is an ArrayList, but that's not the type we get back. It theoretically can be any type of List that fulfills the specification. Similarly, we can write a method that accepts a Set as a parameter. We could get a HashSet, but we might also get a TreeSet. Ideally, we don't have to care which we get. There may be differences in performance, but that is the concern of the caller passing in the Set, as long as we specify in what ways performance might be affected. When we ask for a value out of a Map and get a null, we don't start rooting around in the code for HashMap to find out why we got a null. We assume we're doing something wrong in the calling code. Why should we treat our own abstractions differently?

If we trust the abstractions of the standard library so implicitly, we should be able to trust our own abstractions with the same confidence. How can we develop the same level of trust in our own code, then? We assume that the standard library is reliable because it is well-tested. This means both real-world testing and specified tests like unit tests, integration tests, etc. While real-world testing can only be built through experience, we can easily build the specified tests. Writing tests to the interface, and using those tests to ensure functionality give us confidence that our code works, and is doing what we expect. These tests can be the wall at our back when we are asserting that our code lives up to our abstraction.

Tests are a great start, but they do not guarantee bug-free code. There are times when we do have to debug. We need two additional conditions to make this effort more trivial. The first condition is that we need to craft our exceptions to point directly to the problem. If there is an abnormal condition that we can identify in code, such as getting an unexpected null, we need to not just throw a random NullPointerException (or worse, just swallow the error), but to create a specific error message (if not a specific exception type) that points directly to the fact that we did not get back what we expected. Incorporating this level of exception management requires diligence, but pays dividends.

The second condition to simplifying debugging is to write units so we can interrogate them in isolation. With the testing regime mentioned earlier, we should already be doing this to support simpler unit tests. Interrogating units in isolation allows us localize problems. We can debug with an individual implementation without needing to pull in the entire application. How can we do this without knowing the runtime type though? We are programming to the interface, so all implementations should behave the same. We can write a test for the bug condition against all implementations. They should all behave similarly, but we can find bugs by seeing that they differ from each other, and from expectations set forth in the abstraction. This allows us to use the testing framework to work for us to find the bugs. The abstraction gives us confidence and direction for the implementation.


Popular posts from this blog

Trying Out FreeBSD

This One Subtle Bug You Might Not See Coming

Using .htaccess to Redirect to Minified and Pre-Compressed Assets