Extract and supplement

Even with so much modern technology at our disposal, the majority of the software development industry is still manually drawing software architecture diagrams using general purpose diagramming tools such as Microsoft Visio. Furthermore, these diagrams often don't reflect the implementation in code, and vice versa. Ultimately, we should auto-generate as much of the software architecture model as possible from the code, but this isn't currently realistic because most codebases don't include enough information about the software architecture to be able to do this effectively. This is true both at the "big picture" level (context and containers) and the lower level (components). One solution to this problem is to enrich the information that we can get from the code, with that which we can't get from the code. In other words, extract as much software architecture information from the code as possible, and supplement it where necessary.


Since the code of a software system is subject to the most volatility, it makes sense to automatically extract the components from the codebase so the resulting diagrams don't quickly become out of date. And this raises a couple of questions. If you're using an object-oriented programming language, what's the difference between a class and a component? And how to we find, and therefore extract, components from a codebase? In summary, there are a number of strategies for identifying components, some implicit and some explicit:

  • Naming conventions: Perhaps you've unconsciously used a naming convention whereby all of your controllers, services and repositories (for example) are named *Controller, *Service and *Repository respectively.
  • Packaging and namespacing conventions: Or perhaps you've grouped everything related to a single component into a single package or namespace.
  • Machine-readable metadata: Alternatively, perhaps you've included metadata in the code to signify parts of the codebase as being significant in some way. This could be done using Java Annotations, C# Attributes, etc; whether they are proprietary or provided to you by a framework.
  • Module systems: Adopting a module system (e.g. OSGi, Java 9 modules, etc) might mean there's a very easy way to extract components from your codebase.

Once you work out how to identify components in your codebase, you can write some code to do this.


The higher levels of the software architecture model (people, software systems and containers) are a little harder to extract from a codebase, so often it's easier to simply specify them manually. The Spring PetClinic example shows this technique in action.