Transcending the Limitations of the Human Mind

No, this post is not about mind altering substances, rather it is about how to deal with complexity in general and in software development in particular.

All but the simplest problems that we face in software development exceed the capacity of our minds. Our short term memory can barely hold 7-8 items on a good day, and our computing capacity is almost non-measurable against computers.

How is it then, that we can design and implement software monoliths of millions of lines of code despite of our apparent limitations? How can we learn and understand such large and complex systems? How can we maintain such systems? How can we get better at dealing with large systems, or perhaps more importantly, deal with smaller systems more easily?

Cognitive Capacity

There is some scientific evidence that Cognitive Capacity is a real and measurable thing. In short it means that everything we have to keep in mind or think about during the solution of a particular problem creates Cognitive Load. That is, it uses up part of our total Cognitive Capacity, which is very much finite.

Everything means literally everything. Not just the business concepts, their rules and relationships, but all the technical details we have to track, like remembering special meanings to certain values, whether one method must be called after another one, whether the object needs to be explicitly initialized before usage, what effect certain values would have elsewhere, and so on.

There are two conclusions to be made from this theory if we want to maximize the amount of problem complexity we are able to handle (or minimize the difficulty of handling a given amount of complexity):

  1. We have to avoid using our Cognitive Capacity for technical details, because those don’t get us closer to solving the “business” problems.
  2. We have to limit the complexity of the “business” logic we have to think about at any given time to be below our Capacity.

Whose Cognitive Capacity are we talking about?

Every line of code is written exactly once (if we consider modifications a completely new line) but are potentially read many times during the life of the software.

In other words the effort to write is not as important as the effort to read and understand a given piece of code. It seems reasonable therefore to suggest, that those writing (or modifying) the code should do everything in their power to minimize the Cognitive Load of the readers.

This article argues that the convenience of the writer is irrelevant, indeed a writer should follow the requirements outlined in the next topics rigorously and thoroughly even if it leads to significant additional effort.

Decomposition

Developers of large systems do not (on average) have more brain capacity than any other developer and they aren’t magicians either. What they do have however is a very simple trick to be able to cope with increased complexity.

This trick is called decomposition. If you know this trick and you know how to do basic things like the “Hello World” program, reading/writing files, calculating things, the syntax of your programming language, etc., then you can write arbitrarily complex software.

Decomposition is a method to break a big problem into a number of smaller problems, with the assumption that the smaller problems taken together will solve the big problem.

problem-decomposition
Decomposition of the big problem into smaller problems with two levels.

The above diagram shows a problem which can not be directly solved because of its size, so decomposition is applied. The resulting two smaller problems are still uncomfortably large, so those are decomposed further. On each level of decomposition the actual total problem size remains approximately equal to the original problem, but each time the individual tasks become smaller, until they are directly solvable.

Consider the following task:

4 * 5 + 3 * 2 = ?

Most people will not solve this task in one go, but decompose the task into smaller ones. For example first solve “4 * 5”, then solve “3 * 2”, and then solve adding the two results together. Instead of one big task, 3 smaller ones are executed, with the end result being the same.

Abstraction

Decomposing a problem is unfortunately not as easy as it sounds. In the previous section we assumed that each of the resulting small tasks at the end are independent, therefore they are solvable without considering the other ones. This is however very rarely possible in software development. There are always dependencies between parts of a software, between objects, modules, etc.

Even in the simple calculation example above, the 3 tasks are not completely independent, the “last” task is obviously depending on the previous two multiplication tasks.

So why is this a problem? It is a problem, because if we have to think about things coming from other tasks, we are wasting our Cognitive Capacity with stuff not directly relevant to the task at hand. Dependencies transfer Cognitive Load “up”, from the “dependee” to the “depender”, ultimately defeating decomposition if left unchecked.

dependend-tasks
Dependency between parts transfers Cognitive Load from the bottom to the top.

So how could we limit this backflow of Load up the dependency tree? The answer to that is Abstraction. Abstraction splits up the individual bubbles into two distinct parts. One part is called the interface, and is responsible for containing the knowledge (the “Load”) that the dependent absolutely needs to know in order to work, and the other is called the implementation, which contains everything else. We can not eliminate the backflow entirely this way, but we can limit it quite well.

This method is called abstraction, because the goal is to hide details and in turn offer higher level (more abstract) view of everything that is below, to the layers above.

dependend-tasks-abs
Having interfaces and implementation separated limits the backflow of Cognitive Load.

This concept was already used above to solve this equation:

4 * 5 + 3 * 2 = ?

For the last step, adding the results of the two multiplications together, we don’t actually have to know how to multiply, we just have to know how to add. Why is the knowledge of multiplication no longer required at this last step although it does depend on the multiplication steps?

It is because the multiplication tasks abstracted the details of the multiplication away. The interface of those tasks was just the resulting number and only the implementation contained the actual details of how to multiply.

Leaky Abstractions

Abstractions which do not perfectly hide the details of the underlying layers of abstractions are said to be leaking. They are leaking unwanted knowledge up the dependency hierarchy and with it make the abstraction layers above use up more cognitive capacity from the developers.

dependend-tasks-leak
Unwanted leakage increases Cognitive Load of Task above.

A leaky abstraction, even if it leaks just a little, can have a very big impact on its surroundings, because of amplification through dependencies. Each dependency will carry the same amount of additional leaked knowledge, thereby multiplying the effect of one leak with the number of dependencies on the abstraction. Therefore the more an abstraction is used, the more rigorously it has to be designed to eliminate or at least minimize leakage.

dependend-tasks-leak-amplification
Leak amplification through multiple dependencies.

Note, that whether an abstraction leaks or not can not be decided objectively without context, only when considering the business meaning and responsibilities too. In other words, a leak can only exists with respect to the requirements. Code alone, without knowing the exact requirements can not leak.

Examples of Leaky Abstractions

Consider the following class:

public final class Amount {
   private final Currency currency;
   private final BigDecimal value;

   public Amount(Currency currency, BigDecimal value) {
      this.currency = currency;
      this.value = value;
   }

   public Currency getCurrency() {
      return currency;
   }

   public BigDecimal getValue() {
      return value;
   }
}

This is sometimes referred to as an immutable value object. It is “immutable“, because it can not be modified (mutated) at all, and it is a “value object” because it represents a value and has no identity (two objects with the same values are interchangeable).

Let’s further assume, that the business requirements of the system this class is a part of demands that we can add two such Amounts together. In this case the above class is heavily leaking. There are several pieces of knowledge which could be easily hidden, but aren’t:

  • The fact that an Amount is composed of a value and currency.
  • The type of the value, and with it the complete knowledge about values.
  • The type of the currency, and with it the complete knowledge of currencies.

A proper abstraction (for the above requirements) would look like this:

public final class Amount {
   private final Currency currency;
   private final BigDecimal value;

   public Amount(Currency currency, BigDecimal value) {
      this.currency = currency;
      this.value = value;
   }

   public Amount add(Amount other) {
      ...
   }
}

This fulfills all the requirements the same way the previous implementation does, but requires much less knowledge, therefore causes much less Cognitive Load in the developer using this class.

Another example from a banking application:

public final class CashTransferResponse {
   private Long transferId;
   private BigDecimal value;
   private LocalAccount sourceAccount;
   private RemoteAccount remoteAccount;
   ...;

   ...setters, getters for all fields...
}

It turned out, that this class was used for answering remote calls from other systems based on XML messages. It can be therefore considered an abstraction of a proper response message according to the required protocol. In this case however there is significant leakage here, because the usage of the objects of this class should not require the caller to know all the fields of this class. The class should look more like this:

public final class CashTransferResponse {
   ...private fields...

   public Document toXML() {
      ...construction of a valid response message...
   }
}

Technical Leaks

You might have heard the saying “If it compiles, it works” in relation to using some class or library, usually in the context of a functional programming language. This refers to a perfect or near perfect abstraction, where misusing the interface is virtually impossible, therefore all syntactically valid code constructs are likely to be semantically correct too.

Technical leaks are certain pieces of information a developer has to learn (in addition to the language and its prevailing idioms) in order to use something (a class or a module), even though it has nothing to do with the business case at hand. In other words, technical leaks occur if there is a syntactically valid sequence of method calls that has no real business meaning (it compiles, but it does not work).

Not all languages are powerful and expressive enough to arrive at a near perfect abstraction all the time, still, in the confines of a given language, technical leaks should be always avoided, especially in core concepts, where amplification through multiple dependencies could make problems much worse.

Examples of Technical Leaks

We’ve already seen an example of a technical leak with the CashTransferResponse class. Its original “bean” implementation looked like this:

public final class CashTransferResponse {
   private Long transferId;
   private BigDecimal value;
   private LocalAccount sourceAccount;
   private RemoteAccount remoteAccount;
   ...;

   ...setters, getters for all fields...
}

It was therefore possible to write this code somewhere else:

CashTransferResponse response = new CashTransferResponse();
response.setValue(...);
send(response); // This will fail, because there was
                // no "transferId" set

This type of leak is sometimes called temporal coupling. It requires the developer to call certain methods in some predetermined order, in this case, all the correct setters, before sending can be done. The above code is syntactically correct, but will always fail, because sending does not make sense without the “transferId”.

Another form of this type of abstraction error is having methods to initialize or close. Consider this DatabaseTransaction for example:

public interface DatabaseTransaction extends AutoCloseable {
   Connection getConnection();

   void close();
}

Obviously, this class needs the developer to make sure the transaction is closed properly. The developer is probably expected to use Java’s try-with-resources construct. It’s pretty easy to see however, that the compiler can not really check whether the transaction is always properly closed, so there is technical leakage.

This is one way the technical leak in the DatabaseTransaction could be avoided:

public interface DatabaseTransaction {
   void execute(Consumer<Connection> logic);
}

In this case, the whole logic is supplied into the transaction itself, which can therefore close itself after execution, freeing up precious Cognitive Capacity on the developer side.

Distractions

We developers, being naturally attracted to algorithms and design patterns, sometimes concentrate too much on the technical aspects of our code. Thinking about how to apply certain patterns, how to make a clean separation between certain parts of the code, etc.

This tendency is in itself mostly advantageous, except when it starts to dominate our code. In this case it becomes a distraction. It distracts the reader from the actual content (the domain) and encourages thinking about irrelevant details.

So what is a distraction? A distraction is anything that does not directly solve at least some part of the problem domain, or it can not be easily identified as solving some part of the problem.

Java Beans (see CashTransferResponse above) for example are in this sense distractions. They do not solve any part of the problem, they are merely a grouping of data that may or may not belong together in some context. They don’t hide anything therefore their existence does not make the problem domain smaller in any way, or free up any Cognitive Capacity.

Sometimes distractions can be recognized just by looking at their name: Service, Manager, Util, UseCase, Interaction, Strategy, etc. objects are all distractions. These are usually objects created for the convenience of the writer only and contain some grouping of procedures to operate on other objects. Their name clearly betrays that they are not part of the “business”.

Again, most programming languages (if not all) require at least some amount of distractions to exist. Bootstrapping the first objects, binding technical aspects into the application, etc. Still, distractions should be rigorously avoided as far as possible.

Conclusion

Since the human mind has a very limited capacity to deal with complexity, we have to make sure that we use this capacity very wisely. This involves the following steps:

  1. Decomposing the problem domain, potentially through multiple levels, until it’s split into small enough chunks so every one of them can fit into a human mind.
  2. Creating abstractions (interface and implementation) to prevent knowledge from escaping the chunks. A chunk must fit a human mind including all the interfaces it needs!

During these activities it is extremely important to:

  1. Refrain from distractions. Do not create objects or classes that are artificial, because they use up brain capacity without contributing directly to the solution.
  2. Avoid technical leaks as much as possible. Syntactically correct code should be semantically correct too.
  3. Avoid leaky abstractions. Make an extra effort to demand as little business knowledge from the user of your code as possible.

Most importantly, these rules should be followed rigorously whenever code is written even if it takes significantly more time to do so! It will always pay for itself in the end!

 

 

Advertisements

2 thoughts on “Transcending the Limitations of the Human Mind

  1. Great article. Ideally, it should be possible to reconcile layered/onion architecture with these principles, but I’m not sure how. Specifically, the separation between presentation layer (more specifically the web API layer for web applications) and domain layer.

    Let’s say we need to have both REST and SOAP APIs for our application, so we have two distinct implementations of presentation layer and want to decouple them from domain layer. They would have DTOs for requests and responses (like CashTransferResponse in your example). We also want to hide internal structures of both presentation layer DTOs and domain layer entities.

    To hide the internal structure we need to eliminate getters and setters. So we can use nice frameworks like Spring with Jackson to serialize/deserialize the DTOs based on fields or a constructor. We can also have methods for converting those DTOs to domain objects, e.g. CashTransferRequest.toEntity(), because presentation layer can depend on domain layer. But we don’t want domain layer to depend on presentation layer, so we can’t have a conversion method on the domain object, e.g. CashTransfer.toRestJson(). We don’t want to expose the internal structure of CashTransfer, so we also can’t have getters for some DTO assembler object to access it. Looks like we need to either compromise separation between layers, or compromise hiding internal object structures.

    How would you solve this problem (or is it even a problem for you)?

    Like

    1. Good questions! Coming from Java Enterprise I struggled with this problem myself for a long time.

      The first “breakthrough” I had was when I realized that the Layered Design (3-tier or n-tier architecture) is actually not a good design to have by default. It is only appropriate if there are real architectural constraints forcing the presentation or persistence to be separated. Even then, I would probably look for other solutions first, depending on the constraints. This separation incurs a cost. Doing this separation by default (which many projects do) is just paying continuously for something you don’t actually need.

      I would not decouple any presentation from the “business” objects at all, be it HTML, JSON or XML. I would of course decouple the details as far as practical, but the key is, that nobody should be able to “get” the data from an existing abstraction. That would defeat the purpose of the abstraction.

      This applies to frameworks too, regardless of tricks. Using reflection is still getting data out, therefore should be avoided. It is just an indicator of poor abstraction.

      So for me, I would never ever compromise on object integrity and would therefore not separate technical layers at all, unless very explicitly forced to by some constraint.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s