Back to the Future...

This post was from November 2006. I had just started playing with Scala and was trying to figure out an ORM... the one that ultimately became Lift's Mapper.

Keeping the meaning with the bytes

One of my criteria for a good web framework is having security and access control built it. As I was driving friends and relatives to and from Thanksgiving dinner, I was thinking to myself, "It's nice to have goals, but how do you implement then?"

One of the bad things about computers is that they're just a collection of bytes without meaning. Most programs and programmers want to forget about the meaning of a collection of bytes and input them or output them. Sure, most classes (and related database tables) are meaningful as a collection, the columns in the database and the fields on the object are Strings or doubles or ints. Once they get cast to these types, they have lost their semantic meaning. Sure, the method is String getSSN(), but once the SSN String is returned, it's a string that can be passed around.

Some languages, like JavaScript, have the binary notion of tainted objects. Tainted objects are objects that can't be sent over the wire or otherwise communicated to an untrusted party. Tainting is a binary thing. It denotes literally 1 bit of semantic data (the "this could be sensitive" bit.)

My Thanksgiving 2006 revelation is that the semantic meaning of fields on objects should be retained throughout the field's life and conversion to a semantic-free format (String, double, etc.) should be done as late in the fields life-cycle (e.g., a String returned from 'render as HTML') as possible.

So, let me drill down a little.

There are a ton of different ways for security vulnerabilities to creep into networked applications. There are vulnerabilities related to trusting input from the wire (e.g., buffer overflows.) There are vulnerabilities related to parsing input (e.g., SQL or Command injections.) Bit by bit, Java and .Net frameworks are making these kind of vulnerabilities a thing of the past.

However, it takes a lot of work on the part of the developer to remember to make sure that a social security number is displayed only in certain cases, that a credit card number is never fully displayed, but it is available to the subsystem that performs the transaction, etc. This is a lot of work. It means that every time the developer encounters a piece of sensitive data, they have to do a lot of "if-then-else" testing to see how to deal with it. It means that security audits are a complete review of the entire code base. This is costly, but necessary, because the SSN for a Person maps as a string from the database and is retained in memory as a string.

I suggest that every field of every object be semantically meaningful. That means that getSSN() returns an object of SSN class that derives from Taxpayer class that derives from SensitiveIndentifyingInformation class... It means that the toHtml() method on the SSN class returns '***-**-****' unless the context in which the SSN is being accessed has permissions grants permissions to see more. This means the the SSN (and the enclosing Person) object know the context in which it was created. However, if that context is known and the rules for accessing or changing none, partial, or all pieces of the object are defined in a central place, then the security audit consists of trusting the code that exposes part of the object and reviewing the access control rules.

It also means that a developer can safely write foreach (Person p : peopleList) {out.println(p.getName().asHtml()+" "+p.getSSN().asHtml());} This code will execute correctly for all cases and the developer need not think about access control.

This also implies that the semantic meaning of a database column is defined in the Object-Relational mapper such that when the O-R mapping system gets a column to add to an Object that it is building, it instantiates the correct type of field object rather than just a String, number, etc.