Making Wrong Code Be Wrong

I was recently perusing when I came across a link to a Joel Spolsky blog post. In it, he describes how a form of Hungarian notation can be used to make wrong code look wrong. In the process, he describes the interesting history of how Hungarian notation came to be, and how the common use of it today was not the way it was originally intended.

He describes how Hungarian notation was created to convey application-level information about a variable. It could be used to distinguish between two different types of dimensions that might both be typed as ints. He termed this Apps Hungarian. While the compiler could not tell them apart, the human eye would be able to with a cursory glance. Later, the notation was instead conflated with the type that the compiler saw. An int iWidth carried redundant information that could actually get in the way if the type ever needed to change. He termed this Systems Hungarian.

He gives several examples in the post. Notably, he describes how Apps Hungarian can be used to distinguish between unsafe strings from user input, and safe strings that have been sanitized or strings that are in the code. So String usName = Request("usName") would represent an unsafe string provided by a user through a form. And String sName = Encode(usName) would represent a string that has been sanitized. These names are Hungarian because they use one- or two-letter prefixes to provide additional information, but we could change them slightly. The names unsafeName and safeName use full, descriptive words in place of the prefixes. We could say they just have longer prefixes, but more generally I think we'd agree that these variables are just named well.

The problem with Apps Hungarian is that it still requires a human eye. Whatever their names, safeName and unsafeName are still just strings to the compiler. We can still make subtle mistakes, even with the conventions in mind. Can we do better than relying on human vigilance? The issue with Systems Hungarian is that it duplicates type information. Perhaps Apps Hungarian is doing the same in a way. What if it is still encoding type information, but for types we haven't created yet?

Using the HTTP request example (and modifying it to make it more correct Java), we would have the following with Apps Hungarian:


// I am making up a web framework, so this isn't exactly any particular API.
public void doPost(HTTPRequest request, HTTPResponse response) {
    String usName = request.getData("usName");

    // Do some stuff...

    names.save(usName);

    // Do some more stuff...

    String sName = HTML.encode(usName);

    // Do even more stuff...

    response.write(usName);
}

Now the quick eye will note that I made a mistake in the last line. I am printing out the unsafe string rather than the safe one. Once this gets to code review, someone should spot this pretty quickly. Getting to code review is a comparatively long way away from the moment when this code was written, though. Can we get the code to be wrong sooner? Why don't we start by creating a small type for unsafe strings:


public class UnsafeString {
    private final String unsafeString;

    public UnsafeString(final String unsafeString) {
        this.unsafeString = unsafeString;
    }

    public String toString() {
        return unsafeString;
    }

    public String toSafeString() {
        return HTML.encode(unsafeString);
    }
}

Now the above code looks like:


public void doPost(HTTPRequest request, HTTPResponse response) {
    UnsafeString usName = new UnsafeString(request.getData("usName"));

    // Do some stuff...

    names.save(usName.toString());

    // Do some more stuff...

    String sName = usName.toSafeString();

    // Do even more stuff...

    response.write(usName); // Compiler error!!
    response.write(sName); // But this is fine
}

Instead of having to wait until code review to see that I used the wrong variable, the compiler immediately complains because the type of the parameter is incorrect. The response.write method is expecting a String, but is getting an UnsafeString. We are using the power of the compiler and the type system to find our bugs for us automatically. Now, though, we could forget to wrap every user input in an UnsafeString, leading us back down the wrong path. What if we have the request.getData method always return us an UnsafeString?


public class SafeHTTPRequest {
    private final HTTPRequest request;

    public SafeHTTPRequest(final HTTPRequest request) {
        this.request = request;
    }

    public UnsafeString getData(final String name) {
        return new UnsafeData(request.getData(name));
    }

    // Other delegate methods to emulate an HTTPRequest
}

And our code changes to:


public void doPost(SafeHTTPRequest request, HTTPResponse response) {
    UnsafeString usName = request.getData("usName");

    // Do some stuff...

    names.save(usName.toString());

    // Do some more stuff...

    String sName = usName.toSafeString();

    // Do even more stuff...

    response.write(usName); // Compiler error!!
    response.write(sName); // But this is fine
}

Now the code makes it much more difficult for us to do the wrong thing. We must handle the unsafeness of the string. We are still making some assumptions, though. We are correctly assuming that an UnsafeString is unsafe, but we are implicitly assuming that a plain String is always safe. This may be a good assumption, or it may not be. Rather than assume the best, let's make things explicit. We will add a counterpart SafeString type:


public class SafeString {
    private final String safeString;

    public SafeString(final String safeString) {
        this.safeString = safeString;
    }

    public String toString() {
        return safeString;
    }
}

We change UnsafeString to convert to a SafeString instead of a String:


public class UnsafeString {
    private final String unsafeString;

    public UnsafeString(final String unsafeString) {
        this.unsafeString = unsafeString;
    }

    public String toString() {
        return unsafeString;
    }

    public SafeString toSafeString() {
        return new SafeString(HTML.encode(unsafeString));
    }
}

And our top-level code looks like:


public void doPost(SafeHTTPRequest request, HTTPResponse response) {
    UnsafeString usName = request.getData("usName");

    // Do some stuff...

    names.save(usName.toString());

    // Do some more stuff...

    SafeString sName = usName.toSafeString();

    // Do even more stuff...

    response.write(usName); // Compiler error!!
    response.write(sName); // Also a compiler error!!
}

Now we have a problem that we can no longer write anything out without converting it using toString, and doing that would defeat the compiler checks we purposely put in place. Since we've already customized the request to give us UnsafeStrings, we can customize the response to take only SafeStrings:


public class SafeHTTPResponse {
    private final HTTPResponse response;

    public SafeHTTPResonse(final HTTPResponse response) {
        this.response = response;
    }

    public void write(final SafeString value) {
        response.write(value.toString());
    }

    // Other delegate methods to emulate an HTTPResponse
}

Our top-level code now takes a SafeHTTPResponse:


public void doPost(SafeHTTPRequest request, SafeHTTPResponse response) {
    UnsafeString usName = request.getData("usName");

    // Do some stuff...

    names.save(usName.toString());

    // Do some more stuff...

    SafeString sName = usName.toSafeString();

    // Do even more stuff...

    response.write(usName); // Compiler error!!
    response.write(sName); // This is now assuredly fine
}

We now have the compiler doing all of the error-finding for us. This is the ideal position to be in. It will detect our errors, and the wrong code will be wrong, immediately. These small objects save the day for us by making explicit what once was implicit in a variable name. Now, the prefixes have become redundant with the type, transforming from Apps Hungarian to Systems Hungarian. We can continue to use the type system to clean this up more. We could add an overload write(UnsafeString) to SafeHTTPResponse that would convert the parameter to a SafeString before writing it. Then we'd only ever need the single value and we could pass it without any error everywhere, while still having strong assurances that it is doing the right thing.

Comments

Popular posts from this blog

The Timeline of Errors

Magic Numbers, Semantics, and Compiler Errors

Assuming Debugging