User Interfaces are not data structures

I have had the same discussion on a number of fronts over the last few weeks, so I figured I'd write a blog post about it.

Basically, the gist is that how we allow users to enter data into our application, how we store that data in memory, and how we persist that data are all different things. Requiring that the UI have a one-to-one mapping with the internal data structures is bad for the users and could be bad for performance.

User Stories

Doing Agile (in fact doing most software development) means capturing User Stories that describe what the user wants to do with the system.

User Stories are powerful because they allow the folks building the system to listen to the folks using the system so that the folks building the system can build what user's want.

That's not to say we all can't listen to a lot of users and extrapolate across their user stories into a system that is better than the sum of the user stories, but as a starting point, we all want to focus on what the user wants to do.

The User Interface

The systems that we build have an interface that users use. The users perform actions via the user interface that allow the users to use the system and perform the activities outlined in the user stories.

Note the number of times I said "user" in the above sentence. We are building systems for users and that's our focus.

The User Stories and the User Interface are the "whats" of the system. They are "what the users want to do."

How it's done

Let's think about a word processor. Should there be a one-to-one correspondence between a letter in the word processor's UI and an object in memory? Should that object contain all the attributes of the letter on the screen (X-Y location in the window, font, color, etc)? Or should the representation in memory be rich enough to recreate a WYSIWYG representation, a printed representation, and perhaps a "fast mode" representation?

How ever the data is represented in memory, it should be reasonable for the task.

This means that often we will have multiple representations. We may have a representation that is optimal for user editing (for example, converting a database schema into a representation for editing a record in the database a la Lift's Wizard) and another representation that's optimized for on-going computation (for example, caching database records in local address space rather than making a query each time we need a record.)

But the way that the data is represented in memory and the way that the data is presented to the user are two entirely different things.

How it's stored

How we persist data is different than how it's represented in memory. Why? Because we have different needs for each.

For example, we may choose to persist data in JSON format so that the data can be read by humans and exchanged with a wide variety of systems.

We may choose to persist data in a compressed binary format for very fast transmission to other systems.

We may choose to persist data in an RDBMS so that we have transactional integrity.

How we persist the data is independent of how we represent the data in memory. While it's useful to have a unified representation because writing translation layers is bulky and time consuming, there is nothing that requires the persisted data to be in the same format is it is when it's in memory.

It may be that we keep format in the same file as our word processor's content. Or we may choose to have HTML for the content and CSS for the format and store each in a separate file… or store both in a single file, but in different buckets.

Good design

As one goes through a system, a good design starts with the user and focusing on the user stories and how the user is going to use the system.

The in-memory representation of the data should be optimized based on balancing developer effort (initial and maintenance over time), system performance, and system limitations.

The persisted representation of the data should be optimized using similar criteria.

What is most important in any of the representations, however, is not forcing one representation to guide the features of the system. In fact, there may be multiple in-memory representations (e.g., a representation for fast display and a representation for fast search). There may be multiple persistent representations (save as Word, save as HTML, save as Markdown).

As we build our systems, would should focus on the user first and build representations of our data to suit the user's needs.