Is data modeling (entity-relationship) part of software architecture?

David Garlan recently asked me what I thought about data modeling and its relationship to software architecture. I’m including the answer here because it was effective at articulating some of my philosophical assumptions about software architecture that are not shared by all researchers and practitioners.

  1. The currently defined field of software architecture has a bias to address quality attributes to the exclusion of functionality. I think this may have originated because researchers wanted to differentiate their work from earlier work on high level design, which primarily addressed functionality.
  2. It is impossible to meaningfully discuss many architecturally-relevant topics without referencing domain-specific types, e.g., cars, accounts, sensor readings. You need to reference them when defining ports/roles and behaviors. Modeling these types has historically been covered under high-level design, not software architecture.
  3. In order to effectively zoom-out of implementation complexities (and premature commitments), software architecture should seek out surrogate representations of low-level commitments. Properties in Acme are a good example of this because they do not commit to a data representation.
  4. In MAp (and CAM, Catalysis, …) we commit to types of information, but not their representations. This allows us to refer to essential types (cars, accounts, sensor readings) while maintaining a zoomed-out view of a large system.

Since data(base) modeling commits to a data representation, it hurts your ability to zoom out. It also introduces discussions of N-th normal forms and efficient use of varchar vs. string(20). What MAp calls “information modeling” commits you to the existence and relationship between types, but not their concrete data representations, which allows it to uncover design defects related to the problem domain.

Data modeling is a large and specific concern at companies, because the data and its schema may span applications and outlive all of them. It may have a place in architecture descriptions simply because its significance to companies, but I am unconvinced that it is an architectural style or at the same level of abstraction as the rest of the software architecture ideas.

Comments

Without Data Models There Can be No Architecture

Without an architect’s plans there would never be a building – well no building worthy of the title.

Similarly, without first drawing a data model you can never build a database worthy of that name.

The terms ‘conceptual’, ‘logical’ and ‘physical’ should also be dropped. A data model is none of these it is what it is!

It is a definition of all of the data (and its structure) needed by a business in order to meet its objectives and continue in existence – so a pretty vital element in building the business architecture!

If you decide to build a computer system then you would look at the data model and see what parts of it were going to be included in the database of the proposed system and draw a Database Schema. This is a plan showing what the physical database will look like. Finally you will build the database itself.

So we get three separate things. A data model, a plan of the database and then the database. If you plan to build many databases you will have many schemas but just one data model.

If you want to use the terms ‘logical’ and ‘physical’ you could say the the database schema is a ‘logical’ representation of the ‘physical’ database.

Missing from all of the above are the Business Functions – the core activities of the business. All data elements in the Data Model are those, and only those, required by the Business Functions.

This is where Process Centric approaches to business modeling are essentially flawed. Process Modeling is a secondary level modeling technique, like information Flow Modeling, Procedure Modeling, etc.

The two Primary Modeling Techniques are Business Function Modeling and Data Structure Modeling. Both of these are inextricably linked and all other models and systems can be derived from them.

Read more at: http://www.integrated-modeling-method.com

John Owens

Terminology differences

Hi John,

Thanks for dropping by with your comments. I’ve read over your website and I think I understand where you are coming from. In many ways it’s not useful to debate if X is in category Y — what is important is to follow a process to build systems that solve problems.

The point I’m making in the blog posting is that it is often beneficial, especially in large and complex systems, to zoom out or stand off from the lowest level commitments. It may be vital to store a customer’s name but the system will likely succeed if it is stored as a char(80) or a varchar or something else. Often you can zoom out even farther and simply say you need to be able to store / retrieve the customer’s name. The level of specificity depends on the project. If we had to send that record from Earth to Mars then we’d probably care about every bit in the packets, but on most IT projects we would not. Our brains are only so big, so if we pack our models full of unimportant details then they lose their abstraction value — after all, that’s what a model is used for.

It’s not clear from your post if you believe “char(80)” or “customer name” is necessary for architecture. When I say schema, I mean the char(80) version. I’d use a term like “type model” for the latter. My second numbered point from the blog was that architecture models need some kind of type model. If this is what you mean by “without data models there can be no architecture” then we’re close to agreement. Our architecture model should usually avoid making “char(80)” kinds of schema commitments.

Your last point is an interesting one. The basic division of models into nouns and verbs, state and behavior, or functions and data is a common one with deep roots. Like many folks from the SEI and CMU, I tend to view Quality Attributes (performance, security, modifiability, etc.) as different than ordinary functionality, though it’s not impossible to put them in the same category. There’s also the issue of deployment or allocation, which is difficult to derive from functions and data: if I have a model of functions and behavior it does not tell me if the server is replicated, or if offsite backups exist, etc.

We have to get PEOPLE to build this stuff

I started to respond here, but when I got beyond 3 paragraphs, I decided to move it over to my blog

Larry Maccherone

Evaluate styles of data modeling separately

This blog post assumes a specific narrow interpretation of data modeling. While there is no consensus, many practitioners consider there to be at least three variants of data modeling:
1) conceptual
2) logical
3) physical

Normal forms are a concern for (2) and (3) but not (1).
Varchar vs. string(20) efficiency (along with indexing etc.) is a concern for (3) but not (2) or (1).
Conceptual is similar to the “information modeling” mentioned in the post.
I think separately evaluating the architectural pertinence of each of (1)-(3) would be beneficial.

Architecture and conceptual, logical, and physical data modeling

Again, I am not a database expert. The Wikipedia page on data modeling describes the three levels above, so I’m proceeding from their definitions, which are roughly:

  1. Conceptual model: Things in the domain and their relationships.
  2. Logical model: A database schema, where we would commit to tables and data formats.
  3. Physical model: The embodiment of a logical database, with commitments to things like row ordering and geographical location of the data (eg redundancy).

I’m not sure I completely understand the set of commitments made in the logical model vs. physical model, because the parent post says that commitments to varchar vs. string(20) would not be made in the logical model. Or perhaps the definition of logical model varies.

We build models for a purpose. It is not fair to build a model for one purpose and judge it by its suitability to another purpose. If I build an architecture model to design a secure, fast system then the model might stink for helping be build a modifiable system or a testable system.

In my experience, I have always used a conceptual model when building an architecture model. Depending on the specific quality attribute needs of the architecture, either or both of the logical or physical models could be helpful. On most IT projects I would suggest that you avoid making specific data representation commitments, like the varchar vs. char(20) commitments. However, if this were an app for an embedded device with limited storage, such commitments would be pretty important, so I would include them.

In a way, I think our desire to have a single, easy to understand definition for “architecture” is leading us in the wrong direction. If we asked a related question like “What models would be helpful to for designing a system to do X?” then I think we would realize that the answer is “It depends on what X is”. There is increasing consensus that architecture is more about quality attributes (latency, security, usability, deployability, …) than it is about functionality. It’s not that functionality isn’t important, but that I could build a 3-tier system or a monolithic system or a peer-to-peer system or a SOA system that all achieved the same functions, but had quite different quality attributes. If you tell me what quality attributes the system should have, it’s easier to decide what models to build, i.e., what parts of data modeling to include in its architecture.

Regards,

-George

Overlapping ideas, I think

Thanks for the clarification on data modeling. I have worked with database experts, but do not consider myself one. From your description, I think that “conceptual” and “logical” modeling from the data modeling field overlap with what the architecture folks might call “domain” and “type specification” modeling. Unfortunately everyone who writes a book uses a slightly different term. It’s not surprising, though, that people with different specialities converged on the same basic ideas.

Personally, I think it’s better to think of architecture as a sub-field of design, rather than an independent and completely different beast. There is a big distinction between talking about the world without your system — conceptual modeling or domain modeling — versus the world with your system. The truth in conceptual/domain modeling should be enduring, like “doors can be opened” or “cars have wheels”, and will not change whether or not your system exists. It is useful to talk about architecture, because they are the largest-scale design decisions you make and there are techniques that work there that do not work in detailed design, but architecture is still just a part of design.

From that perspective, the question of whether or not (physical) data modeling is architectural is somewhat academic, because it is a design activity, and you can call that activity architecture, design, data modeling, or Henry. (It may be relevant to process definers, because they will want to say that Person X will do Activity Y during Phase Z). Perhaps the blog posting could have been more clear: We model architecture because it helps us deal with complexity and scale, so we necessarily reach for models that help us condense details rather than models that ask us to make detailed representation decisions.