Data Boundaries are the root cause of Maintenance Problems

Many designs and patterns old and new, like the Layered Architecture, the Clean Architecture, the Hexagonal Architecture, DCI and others introduce data-oriented boundaries inside the application.

Data-oriented boundaries are interfaces between architectural parts that primarily consist of data in the form of “properties” that can be freely accessed either directly, through getter methods, through reflection or some other technical means.

Regardless how the data is actually accessed or which side defines the interface and which one consumes it, these kinds of boundaries create serious maintenance problems for the software.

Let’s take a look into why this happens and what alternatives exist to avoid these problems.

Data-Oriented Boundaries

This is a classical 3-tier design with Data-Transfer Objects as boundaries:

This design became prevalent in the late 90s, when a lot of developers (myself included) transitioned from traditional procedural languages like C, Pascal, Basic to Java. Sometimes the DTOs were called Value Objects at the time, but regardless what they were called, they were the familiar data structures we’ve grown accustomed to previously. It was a popular approach and easy to understand, because it didn’t really require a change of mindset.

This was also the time when a lot of applications were still “rich-clients”. That means these applications had to be installed on the user’s computer, they provided a GUI and connection to some backend server. The Web (HTML-based frontends) was still new and we couldn’t yet figure out which one will work or survive, so it kind-of made sense to separate the frontend a bit from the application. The layered design shown above helped us somewhat in case we needed to switch technologies, which actually happened often. Mostly in one direction, from rich-client to web.

Neither of those reasons exist today, but this design and the underlying theme of “making switching technologies possible” still stuck somehow. Which is a problem, because it introduces a great cost.

The Maintenance Problems Start

Let’s take a look at this example:

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }

   public BigDecimal getValue() {
      return value;
   }

   public Currency getCurrency() {
      return currency;
   }
}

When this class is returned from some method, both the caller and the callee would have to know all about the Amount, including the attributes value and currency and also what they mean. Both sides have to know all the rules regarding handling these Amount objects, like not adding the value if the currencies differ, not comparing them etc.

What would happen if we were to change this Amount. Let’s assume it turns out that arithmetic with BigDecimals is too slow and we need billions of operations per second. We would like to change the “internals” of the Amount from BigDecimal to Long (which would be in cents for example). Obviously we could keep the BigDecimal getter to keep the API stable and convert the internal representation to Long, but that would defeat the purpose of our change, because the arithmetic would still happen on BigDecimals. Instead, now we need to track down each usage of that attribute and see how it would be impacted by our change. This is exactly what unmaintainability looks like: you have to manually track down consequences of a change.

Let’s take another example. We want to introduce a field which indicates whether the value is in units or 1/100 units (for some currencies this makes some sense). While in the previous example we still at least get some help from our compiler, showing us usages where the type change introduces problems, this change will cause no compilation issues whatsoever. We have to track down usages again, but this time without the explicit help of our development environment, with the additional task to understand how those sites use the Amount to be able to change them accordingly. This is an even worse situation than before.

Please note that this is just one very simple and incomplete example and it already starts to get out of hand.

Why is this happening?

The biggest problem with data-oriented interfaces is that they share meaning implicitly. This is not the good kind of sharing either. It means that because the communication is reduced to data, both sides must have the appropriate interpretation for that data, which might include anything from simple things like what values it could have (can it be null?) to complex interrelations between different parts (like the 1/100th flag above).

If both sides must possess this knowledge then it follows that both sides have to change when this interpretation changes. To make it worse, the interpretation is implicitly shared, because there is no way to detect if suddenly other rules or interactions apply to the data, so there will be very limited language and IDE support for implementing the change. The only way to prevent this outcome is to keep this knowledge localized and hidden as much as possible. The above example should look like this:

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }

   public Amount add(Amount other) {
      if (currency.equals(other.currency)) {
         return new Amount(value.add(other.value), currency);
      }
      throw new IllegalArgumentException("...");
   }

   public boolean lessThan(Amount other) {
      ...
   }
   ...
}

We no longer publish the internal state of the Amount, instead we provide business-relevant methods that users of this class can use to manipulate Amount objects according to all rules and regulations controlled completely by the Amount. Note that the knowledge of how amounts are added up or compared is described here exclusively and it is impossible to violate these rules now even if the caller doesn’t know any of this.

It is also easy to see that the two example changes proposed above, changing BigDecimal to Long or introducing the 1/100th flag attribute would be possible in this design by changing only the Amount class. This is the crucial point of a maintainable design! It is now possible to change the Amount, either change the internals or even introduce new features, without changing the semantics (the meaning) of what the Amount is and what it does.

Don’t forget the UI!

At some point some things will be shown to the user. For example the Amount object could be the balance of an Account, which has to be shown on a Web Interface.

There is nothing special about the UI and the same rules apply here as well. Instead of having a data-oriented interface from the “business layer” to the “presentation”, as would be the case in a layered design, the Amount should instead offer the relevant functionality itself. Let’s consider what happens if it doesn’t do that, what knowledge would have to be passed to the “presentation” layer?

  • The construction of the Amount object, with all its parameters
  • All the attributes the Amount publishes
  • How these attributes relate or influence each other
  • The exact type and way to show each attribute (what kind of display it needs, what length, what precision)
  • It has to be able to evaluate certain conditions, like whether the amount is negative or positive
  • How to ask for an amount from user. Again, what kind of display to provide, for example how many input boxes or selection widgets to display, etc.

It is fair to say, the UI has to know everything there is about the Amount to be able to present it and to get it as input from the user. Therefore any change in the Amount will result in changes in the UI, which means things that change together aren’t together, a hallmark of an unmaintainable design.

The solution to this problem is the same as before. The Amount keeps this knowledge and instead offers the two relevant methods the UI needs to operate (using an imaginary web-framework here) :

public final class Amount {
   private final BigDecimal value;
   private final Currency currency;

   public Amount(BigDecimal value, Currency currency) {
      this.value = value;
      this.currency = currency;
   }
   ...
   public Component display() {
      return new TextView(
         new NumberView(value), currency.display());
   }

   public InputComponent<Amount> displayEditable() {
      return new InputGroup<>(
         new NumberInput(), currency.displayEditable(),
         Amount::new);
   }
}

It is easy to see that in this design none of the knowledge above has to leak to the UI, while still keeping details (like colors and font-sizes, etc.) of the UI away from the Amount.

Real-life example: Weld/CDI Project

There are probably a hundred examples of sharing knowledge just in this one project, some simple and visible, some more subtle and complicated. Here are just two simple and easy to explain ones.

In this example the class WeldFilter defines itself in terms of the data “name” and “pattern“. It turns out there are certain rules how these can interact, but this was not built into WeldFilter, instead some code in a completely unrelated part of the project must do this:

if ((weldFilter.getName() != null
      && weldFilter.getPattern() != null)
   || (weldFilter.getName() == null &&
      weldFilter.getPattern() == null)) {
   throw new IllegalStateException("...");
}
if (weldFilter.getPattern() != null) {
   this.matcher = new PatternMatcher(...);
} else {
   this.matcher = new AntSelectorMatcher(...);
}

Imagine someone tried to change the WeldFilter class, add a parameter/attribute, a boolean flag or some new options. That person would have to search for all usages of the class and figure out how that would impact this unrelated part of the application. Even worse, the WeldFilter class is in another repository, so this search would likely not turn up the above code, leaving it broken without any indication that it is in fact broken.

Another common example is to push null handling to the user code. The class AbstractMemberProducer offers a getDisposalMethod() getter:

public DisposalMethod<?, ?> getDisposalMethod() {
    return disposalMethod;
}

The caller however has to know that this method might return null. So this spreads null checks all over the code. Just because of this one method, there are at least 3 classes in completely different places that have the same exact code to check first if the disposal method is there or not:

// From Validator
if (producer.getDisposalMethod() != null) {
   for (InjectionPoint ip : producer.getDisposalMethod().getInjectionPoints()) {
      ...
   } 
}

// From AbstractProcessProducerBean
if (producer.getDisposalMethod() != null) {
   return ...
}

...

Note, that the null problem itself can of course be solved technically, by using Optional, or a different language perhaps. However, null is just the symptom here, the problem is that raw object is shared with the user code and with that the user code has to know the semantics of that piece of data.

How does this relate to other Designs?

Although this article refers multiple times to the layered design, it is by far not the only architectural pattern that focuses on technical separation to the detriment of cohesive functionality and localized changes.

The most recent one is the Clean Architecture. This architecture builds on the notion that its boundaries exist completely for technical purposes (therefore contain mostly data without behavior), optimizing for changing technologies instead of changing business functions. Here is an analysis of the Clean Architecture code showing just how many changes are required for very simple features.

DCI (Data, Context, Interaction) is a little bit older, but is also built around the idea that data and function should be separated. This is indeed so important for this approach that it is the name itself. Unlike the Clean Architecture it doesn’t do this for technical purposes, instead it asserts that the data in objects is stable and changes rarely, while the actual algorithms in the objects change often, therefore justifying their separation. This approach results in a lot of data-oriented interfaces.

The Hexagonal Architecture (Ports and Adapters), just like the Clean Architecture wants a “pure” core logic without any technology and separates technology aspects with the help of “ports” from the core. It introduces artificial (non-business) boundaries inside the application, because it assumes (like the other approaches above) that most modifications to the software modify only the “pure” business-logic and rarely if ever touches the data, api, ui or database. The “ports” are usually implemented in a data-heavy or data-only way.

Summary

The most current and popular architecture patterns still build heavily on data-oriented interfaces inside the application, using data-only objects, beans, DTOs or other similar means. These data-heavy constructs however, because they contain raw data without their correct behavior, transfer the responsibility and knowledge to handle them properly to the caller, so both sides have to know the same things.

This sharing of knowledge will be the cause of maintenance problems later in the software’s life, for the simple reason that any modifications on either side will likely need a manual re-evaluation of what happens on the other side with the data. This can be a difficult job, since it can mostly be done only by reading the code that uses the data, which escalates really quickly if there are potentially multiple places to check.

The alternative is to not work with the data “somewhere else”. Keep all the “working” parts inside the object, and to be sure, just don’t publish the data.

15 thoughts on “Data Boundaries are the root cause of Maintenance Problems

  1. So if I understand correctly, you are saying that a data object should not leak its data “type” all over the place, rather offer up its behaviour to the user? That’s the Composite pattern, right? ( Although that only speaks of collection of objects being handled as a single object, the sentiment is the same as in providing convenient methods to work with the data object underneath. (( but don’t quote me on this 😛 )) ).

    Considering the hidden nature of the data, what kind of architectural pattern / boundaries do you propose should exist? Because you clearly need them in case of a huge software. By what would you elect the boundaries? What would you separate them by? Meaning? Focus on the business logic? Group them by terms of responsibility or functionality?

    I thought the whole thing of data objects and separate is not by the data object but by facilities such as Repositories and Services that are there to handle the data objects. DTOs shouldn’t be cross cutting.

    However, I see and understand how a Data Object representing a complex structure should also provide the means to deal with said structure. The implications sadly can be very subtle. It’s not always as clear as an Amount where you can provide an Add method. It happens that you only realise that you have a complex data object and it’s knowledge scattered amongst your code when it’s too late.

    Liked by 1 person

    1. Very true, it is not always as easy as with the “Amount” object. It is not always easy to find a suitable business-function, and it gets even worse when the data is gets repackaged many times. We are talking here about refactoring “legacy” code. In these cases you have to unify an object through multiple “layers” of code, which is just a lot of work. I’ll find a suitable example and do an article or a video on it.

      Regarding your question about what boundaries should exists: I’m speaking here only about “internal” boundaries for now. All boundaries, that includes the api of objects as well as public classes in packages, should be business-specific. That is, directly derived from requirements. Amount should have “add()”, “lessThan()”, etc (assuming the requirements specify that I need to add amounts). An Account can contain “freeCreditCards()”, etc.

      When you do this, there is very little to no reason to have Repositories and there is no reason at all to have Services. These are all technical things as you might have already noticed, they have no business-relevant meaning.

      Liked by 2 people

      1. Gotcha. I might have gone in the wrong direction as in I thought you are also talking about different boundaries, not just internal ones. Though this sentiment still holds true in that regard as well.

        Yes, indeed. But long live senseless DDD! 🙂

        Like

    1. Yes, I would and I have.

      I sense some doubt about this approach in your question, so let me clarify. Architecture is always trade-off, right? So what are the two sides here.

      If you make your “business objects” completely UI agnostic, you may replace/change the UI technology without touching the “important” code. Which is in itself a good thing no doubt. The downside is data-oriented interfaces, as argued in the article. The UI will have to know basically everything about the business objects. So you can replace the UI without touching the business code, but you can’t reasonably change the business code without changing the UI.

      If objects can present themselves however, it is the other way around. The UI code becomes business agnostic, and the business code has to know (some aspects of) the UI. So in this case you can not easily change/replace the UI, but you can easily implement changes to business code, because all those changes will be localized.

      So the question is, which one do you want? Do you want to optimize for changing the UI technology stack, or do you want to optimize for changing business code? Which one do you do more often?

      Liked by 2 people

      1. Thank you for your answer, that’s a really interesting point, especially because the separation between model and view is always and everywhere considered “best practice” and most of the people would not even think about doing something like that.

        Liked by 1 person

  2. I’ve been thinking about this for a long time. I agree with you in principle, but there are situations where I find problems implementing it.

    There seems to be a class of objects such that they can either hide their data or implement polymorphic behaviour, but not both. Your Amount is an example of such object. To implement different kinds of amounts with slightly different behaviour, Amount has to be an interface. But in that case the add(…) method would have to accept an Amount interface and it wouldn’t have access to its fields. So if we needed polymorphic amounts (not likely, but I just use it as an example here – there could be other more sophisticated objects with this problem), we would have to expose value and currency in the interface. Do you encounter such problems, and if so, how do you usually deal with them?

    Also, single-page applications (SPA) seem to be getting more popular, especially when rich, interactive web UI is needed. They separate frontend from backend, forcing them to share data via HTTP calls. What do you think about them? This is horizontal (or technical) separation, which smells to me, and also goes against your article as far as I can tell.

    Liked by 1 person

    1. Excellent points, as always 🙂

      I do sometimes have problems with objects that may need polymorphic behavior, but at the same time would only work with the concrete implementation at hand, because of data hiding. Like the Amount as you said. To be honest I don’t have a generic solution to this, but I always managed to side-step the issue using basically 2 approaches that I can remember: 1. extracting the polymorphic behavior into some meaningful other polymorphic object, and leave the original a single implementation. 2. just re-designing the api of the object to carry the “type” information, like a fluent api (so you don’t have the polymorphic thing as parameter anywhere). These won’t always be possible, and I don’t know whether that’s an inherent problem in the design or whether that is just a language problem (i.e. what would happen with Scala-like self-types or typeclasses).

      As to SPAs. They are not the problem, at least not in this context. The problem is that most javascript based frontend technologies want pure data from the server. That would be a data-oriented interface and it comes with all the drawbacks described in the article. However, the Server (and the objects within) could easily generate the html+javascript themselves (using the React components, HTML5 custom elements, etc.), not just the data, in which case all changes could still be localized to the objects. This is analogous to supplying the behavior in Java.

      Liked by 1 person

      1. Reffering to the second paragraph (SPA), if my app communicates with another service using data only (rest controllers + json format) am i forced to use DTO objects or is there better solution?

        Like

        1. There is. Although I would not implement a data-only interface for SPAs, I would perhaps not do an SPA in the first place, but there are other designs that need a data-only interface. Right now I’m working on an application for a client where messaging is involved with JSON messages.

          These messages are not generated from DTOs however. Either objects present themselves as json, or there is actual specific class for a message (since there is no behavior for that particular message) but it is generated by an object directly.

          There are other, better ways to separate protocol details from objects that don’t need to know those. Normal refactoring, pulling and moving stuff around, factories, etc.

          Like

  3. When I was a student, systems were defined as “inputs -> Processes -> Outputs” (note the plural), so, if you think in a system in these terms, it makes sense to think in data structures instead of “complete” business objects. In fact, systems nowadays are more comunicated than ever before (APIs, microservices, databases, queues, all of them use just data).
    I understand that OOP is a great way of encapsulating, but encapsulation bring it’s own problems as well: when you need this object be used in several places it will grow a lot to fulfill everyone’s needs.
    In your first example for Amount, changing the data type will be a headache, not matter what approach you use.
    In your second example, just having an “AmountMultiplier” object will solve the situation.
    So, as you see, you can choose to add methods in the same class, or in others. But adding them in others leaves the data clean, to continue being used by the other components that are already using it.
    Now, having two objects is a problem? Put them in their own namespace 🙂

    Like

    1. I get where you are coming from, there is a lot of the same thoughts floating around the net.

      Well, “clean” data also means data without behavior, data without semantics, data without meaning. The consequence of which is that the “meaning” will be smeared all over the application and won’t be local to the class.

      For example you can implement “AmountMultiplier” (let’s ignore OOP for the moment) if you want, but because you also have the data available to everybody, you can’t actually be sure that nobody implemented that somewhere else even indirectly by using the data in some way. Sure, you can use that class if you want yourself, but you can not make others use it consistently. So if there are some rules on how to multiply an Amount that you want to apply consistently, you have to search for all usages of the data. And then you still can’t guarantee that someone will not implement it later somewhere else incorrectly.

      Once you published everything you are no longer in control and you can’t put the genie back in the bottle. Remember, you are not alone on the project, there will be all sorts of people writing code, including some that are not as knowledgeable as you are, don’t know the system, come in later on the project, or just plain forgot the particular rules of the class they use.

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s