July 17, 2023

DocBook or DITA for technical writing – What is the difference in 2023?

Share
image shows a question mark

Sometimes customers or prospects ask questions that need long answers to provide the right level of clarity. One such inquiry led me to write this article, which looks at what I would call the misconception of the role of the XML content model of DocBook and DITA in structured authoring and XML-based technical writing solutions. 

Both DITA and DocBook are robust XML content models. We could have built Paligo using either DITA or DocBook as the base. We chose DocBook even though we had more experience with DITA. We were and remain convinced that the system should do most of the heavy lifting, with a clean and robust XML model as the source. We wanted to add the advanced functionality in the system itself, where it should be, rather than in the content model where it raises all sorts of complexity.

A little bit of information about my background helps set the context for Paligo’s development and our choice of DocBook as our content model. Before Paligo, and starting 1999, I worked as an XML and XSLT consultant, initially with custom-developed XML content models loosely based on ideas from Information Mapping. I specialized in DITA beginning in 2006, consulting around DITA software implementations, customizations, information architecture, and system evaluation until 2014. I was very much riding the hype of this new (at the time) content model that had come around. 

So let’s dive in a little further to explore the characteristics of DocBook and DITA, and share why we chose DocBook to build Paligo.

The content model should not determine documentation system capabilities

First, to clarify, it doesn’t really make sense to compare Paligo and DITA (even though we’re often asked to, and that’s what prompted this article). 

DITA is an XML content model, and Paligo is a full Component Content Management System (CCMS) and a technical documentation solution. Similarly, Paligo is not DocBook. It is only using a (slightly) customized version of DocBook as the content model in the backend. 

And consequently, it doesn’t really make sense to ask which of DITA or DocBook is better.

It’s meaningless because it assumes you can compare XML content models in isolation, as if a content model alone could provide the features you need for an efficient technical writing solution. 

Don’t get me wrong, the content model certainly needs to fulfill certain requirements. It should be a properly XML-based content model to be able to build and support robust technical documentation solutions. Not HTML, XHTML, Markdown or similar formats. Those are not powerful enough to handle complex content.  A robust and rich semantic XML model is definitely the best format to handle complexities in content development, single sourcing and content reuse

Both DITA and DocBook fulfill those requirements, in slightly different ways. 

Indirect linking, relationship tables, and other features

In the early days of Paligo, we naturally considered using DITA as the content model, since we had extensive experience with DITA software as consultants. 

However, we soon found DocBook to be a cleaner model. It had a much more mature framework for processing. And it didn’t “get in the way” as DITA did. Because in our view, DITA had a problem—it wanted to be too many things at once. 

DITA has things built into the content model that just don’t belong there. Things like “indirect linking” (in short, the ability to link/cross-reference between documents without specific file paths). Indirect linking is a superb capability in itself that increases the possibilities of content reuse, but in order for it to be usable, it doesn’t belong in the content model. 

The reason DITA has too much built into the content model itself, I think, is that it was built around the idea that anyone should be able to just download the package (the DTD/schema and the open source DITA Open Toolkit), and work with files on a local file system, without the need for any programmatic processing or the use of a database. It is designed to be available not only to companies who could buy a CCMS, but also to the lone writer. (That is, if you have the technical skills and stamina to stick with it.)

If you need to work directly with DITA in a barebones environment on a file system, you have to be able to handle problems, such as when you move or rename files, all links (and all reused elements) would be broken. So “features” like indirect linking (conkeyref, keyref, relationship tables, etc) were added to the content model. This didn’t solve anything by itself, but at least the lone writer now had a way to (laboriously) set up a system of indirect linking by mapping topics to IDs instead of direct file paths.

Typically the lone writer would download the kit to work on a local computer, and over time one might say that this use case ended up being the one prioritized. Because in doing that, DITA’s content model built in a lot of complexity that was superfluous when used in a content management system. It just created unnecessary baggage, and made it a less clean model for a system to work on.

So it is true that DITA has some properties that DocBook does not, but only because there really is no reason to have that functionality in a content model. The content model should only be a schema (“recipe” or “template” if you will) for the valid content of the document structure.

In the DocBook model, such functionality is left to the processing system (the CCMS in this case). So no, DocBook doesn’t include some features that DITA has. But Paligo does. And it’s easier to use these features as they are fully developed software features, where the user doesn’t need to bother with the complexity that the system handles in the background.

So, for example, it doesn’t require you to laboriously create mapping files to specify relationships between topics. Instead, the system feature lets you simply drag and drop tags (taxonomies) onto the topics to easily create relationships. And there is no need to manually update them, as Paligo will automatically detect if the relations are present or not, and create the relationship links, no matter in how many contexts you reuse the topics.

Modularity is not an issue

One of the core characteristics often attributed to DITA surrounds modularity, also referred to as “topic-based authoring”. But DITA is not the only content model that supports the modularity of topics. Norman Walsh, the main founder of DocBook, stated as early as 2005, “there’s nothing that prevents you from writing modern, topic-oriented, highly modular documentation in DocBook”. And that is just looking at the content model by itself, not yet including the possibilities of a CCMS.

DocBook is a perfect setup to use with a CCMS, because of its clean model, making it very easy for us at Paligo to map this to the topic-based paradigm we wanted. We could then add lots of functionality to make it easy for the end user to build hierarchical topic structures, without the content model getting in the way, but rather just serving as a solid foundation.

Moreover, we weren’t forced to rely on the DITA Open Toolkit, as most DITA implementations are. The DITA OT is a java-based open source application, and DITA solutions often need to adapt to the (understandably rather slow) development of that. For our purposes this would only have limited what we could do for performance and added features.

Topic typing and specialization in a documentation system

Another area often mentioned when discussing the characteristics of DITA is its specialization model. DITA does have this built in, making it theoretically possible to create other content models out of it. 

In fact, the main content model itself is a specialization, i.e., the topic types “task”, “concept”, and “reference”. This topic typing specialization is at the core of DITA. 

But the question is, is such a strict topic typing really a good thing?

In my experience, using these topic types often makes technical writers feel like trying to fit a square peg into a round hole. 

See for example Mark Baker’s article “The Tyranny of the Terrible Troika: Rethinking Concept, Task, and Reference”, which talks about the limitations in information design that authors may not realize are there when they try to fit content into neat boxes.

Some DITA proponents will argue this is where specialization comes in—if you don’t like the basic topic types, you can always specialize your own. 

There are just a couple of problems with this. First, in all my years of DITA consulting, I didn’t once come across a company that actually wanted to specialize topic types. Second, if you do specialize the topic types, you may just dig a deeper hole for yourself, trying to come up with a completely new information architecture, and most likely one that is even more restrictive (otherwise, what would even be the point of topic typing if you specialized into more generic types?). Trying to figure out what model fits real content in an organization is a heavy burden, especially on the writers trying to work out what types to use.

I remember one consulting project I was involved in, where a discussion came up around creating one new topic type. Just one. For “troubleshooting” topics. The discussion about how it should be structured to really work for their content never ended. Finally it was dropped, and the directive was to work with the basic topic types, just creating writing guidelines to explain the content types.

So, even though it’s true that DITA is built in a way to simplify the creation of new element types, in many ways this just generates a lot of extra work. In reality, it doesn’t lower costs at all. It creates complications and probably even additional costs.

Flexibility and pragmatism

There’s no right or wrong. Some might like the more restrictive DITA model. The DocBook committee chose not to implement topic typing, presumably seeing more value in a more pragmatic and flexible content model.

As Mark Baker states in another article, picking up on the fact that Paligo chose DocBook for its flexibility compared to DITA: “DocBook offers a far richer set of markup structures that can represent all of these things, but without the restrictiveness of DITA. It makes sense, therefore, for a company like Paligo to choose it for their underlying document structure.” (“DocBook resurgent: what it tells us about structured writing and component content management”)

In summary, DocBook and DITA are both solid and rich XML content models. I’m sure we could have built a great technical documentation solution using DITA as the source for Paligo, too. But it would have been harder to implement without creating additional benefits. The functionality and the single-sourcing possibilities of a modern technical documentation platform are not and should not be built on the content model, but on the capabilities of the system, with a solid content model as the foundation.

You can spend a lot of time trying to wrap your head around relationship table mappings, keyrefs and conkeyrefs, DITAVAL files, and the intricacies of the DITA Open Toolkit. Personally, after all my years of DITA consulting, I found it wasn’t worth it. There was a better way.

This article was originally published in February 2020. Updated July 2023.

Share