Efficient Technical Documentation: Why You Need to Separate Content & Layout

January 2, 2020
image shows separate design papers

A few years ago I wrote a blog article about the 5 Principles of Single-Sourcing. But there was one central principle related to structured authoring and single-sourcing that I didn’t really go into. It didn’t quite fit into the general theme of the article, and I may have thought it was already common knowledge.

It turns out I was wrong. This important principle for technical writing was not common knowledge at all.

I’m talking about the principle of separation of content and layout (sometimes also called “separation of content and presentation/format”).

So why is it so important to understand and enforce the separation of content and layout? Because it is at the core of structured authoring. And if your technical documentation environment is not enforcing structured authoring, well, then it’s basically a house of cards.

Read on and you’ll see why.

Let’s start from the beginning. I started noticing that this principle was not as well-known as I’d thought over the last few years when seeing companies migrate their technical documentation from other tools into Paligo. If the users come from another structured authoring environment, like DITA, they may already be aware of the principle. But if the content writers and managers come from a non-structured authoring environment like Word, InDesign, or any of the HTML-based Help Authoring Tools (HAT tools), it may be another matter.

I’ve found that technical writers and other content creators who come from non-structured authoring environments are sometimes aware of the principle of separation of layout and content at a high level, but often do not really understand how it works. Others are not yet aware of the principle at all.

What does ‘Separation of Content and Layout’ Really Mean?

Consider a word processor application, like MS Word, Google Docs, or Apple Pages. You can write content in it, of course. But not only do you write content, you also apply styles: heading styles, paragraph styles, table styles, and more. You can decide that your heading 1 style should be 16 pt. Times New Roman, for instance. Your paragraph style may have a 2 cm indent, because you think that looks good when you print the document.

So you decide there and then exactly what the output format should look like, right when you’re writing it. Sounds good, right?

Not really. This might have worked decently a long time ago, when all you needed was to get your document out on paper in one format.

These days, you probably don’t just want to print a document on paper and be done with it. You most likely want to at least get it on the web, on mobile phones and tablets, your support help desk knowledge base, eBooks, or perhaps even in chatbots.

When you apply styles to a document in the authoring process, you limit the content’s usage and its reuse possibilities, as well as the possibility for multi-channel publishing.

On the web or a mobile phone, you won’t want your paragraphs to have a 4 cm indent, you probably don’t want Times New Roman for the heading (bad web font), and you don’t want a long book format (bad user experience).

These problems are typical when you apply styles directly in the authoring process. And that’s if you’re experienced and lucky. If you have worked in a word processing environment, you’ll recognize that it’s just too easy to add any look and feel you want even without applying styles. You want a heading but don’t have time to pick or create a new style? You may just make it bold, increase the size, change the font, and bam! It looks just like you want it.

But now the problem is even worse—not even a style to indicate you have a heading. Consistency is thrown out the window.

Combine the ability to create an unlimited number of styles, with a number of authors working on the content (all creating an unlimited number of styles), and you have a maintenance nightmare on your hands.

Spacing and Page Breaks

Another common problem that happens when you apply formatting during content writing is that content itself is used for formatting purposes. For instance, if a writer wants some extra space between some content, such as space after a heading, it’s very common to just add a couple of extra paragraphs. If you have used the little paragraph icon to check hidden characters in Word, you’ll often see something like this:

This example doesn’t show content styling; it’s content (two paragraphs) used to create space. When you want to publish this to other output channels (web, mobile) this doesn’t make sense—it results in all sorts of odd and unplanned displays of the content.

And what about page breaks? Have you seen the method used to create a new page just by hitting Enter as many times as needed? It’s a quick fix, but doesn’t create a proper page break.

But of course, there are actual page breaks you can insert in the word processor, if you do it properly. Ctrl+Shift+Enter will do it. That’s better. But it still doesn’t make the content reusable. What if you need to translate this document? Translating an English document to German or Finnish, for example, and it will certainly not result in the same length of text. You could end up with page breaks in the middle of the page and need to reformat it again, manually.

The same problem will occur if you want to reuse parts of the text for another document, such as another product user guide that happens to have some similar content. Again, the page breaks will end up in the wrong places when you place the content in a new context.

A large number of manual spacing and page breaks translate to lots of unnecessary and expensive extra work and cost.

Semantic Markup and the Separation of Layout and Content in HTML

When HTML was in development, the need to separate content from presentation was realized early on. Initially, while HTML elements were rather abstract, like h1, h2, etc, there were still some format-specific tags, such as the font tag.

But as HTML evolved, more and more of its structure evolved towards removing format-specific elements, and promoting more semantic elements like <em> (for emphasis) and <strong> as semantic alternatives to i and b (italic and bold). By providing the semantic meaning of the element (emphasis), the styling layer determines whether emphasis should be indicated by bold, italic, or any other presentation, such as red text.


Thus, in HTML, structure and content were separated from the resulting end layout, which should be handled by the styling layer in separate CSS files. This has gradually evolved so that in HTML5, the elements are more semantic than ever, and some style-specific attributes like the “border” attribute on tables are deprecated (see HTML <table> border Attribute).

By separating content and layout this way in HTML, the content and structure became more reusable—the same HTML could be used with different CSS themes, and change the entire look and feel of a site, even though all the content data and the HTML structure remain the same.

For an example of a site that exemplifies this really well, visit CSS Zen Garden. It uses the same HTML structure, only changing the underlying CSS.

Taking It All the Way – Separation of Content and Layout in XML

While the separation of content and presentation in HTML is a good thing, a few problems still remain. For one thing, HTML is just one output format that content teams are required to use. You may also need to publish to PDF/print or eLearning content, for example.

But more importantly, even with the evolution of semantic tagging markup in HTML, the semantics you can express are quite meagre. You can choose to express emphasis instead of the set style bold or italics. But what if you wanted to have multiple types of content stand out in some way?

For example, say you are documenting software, and you need to indicate that you are talking about a menu item, or a button, or a filename. You want to make this stand out, but if you just tag it with <b> (bold), or even <em> (emphasis), you are more or less still making it rather static, and you cannot differentiate between these different items in your software user interface.

That’s where XML shines for technical documentation. There are several dialects of XML, such as the well known DocBook and DITA. But the common trait among them is that they provide a much richer semantic markup than HTML. So instead of just tagging your UI components as bold or emphasis, you can tag them with specialized elements called guimenu, guibutton, and filename. And the same goes for a multitude of other semantic tags, so you have special elements for note, warning, programlisting, and many others.

The fact that the guibutton element happens to be bold in the XML editor is just the WYSIWYG—What You See Is What You Get—editor display for convenience while authoring. (Or, to be more exact, WYSIOO—What You See Is One Option.)

Now that you have separated the content from the layout by tagging your content with truly semantic markups that indicates what they describe, instead of applying the generic styling and cementing that styling, you have freed up additional styling possibilities. You are free to choose (now, or later if you change your mind) to apply a Courier font to the filename, a blue bold font to the button, or whatever you feel is appropriate.

The same goes for spacing and page breaks. In true structured authoring, spacing is never created by extra paragraphs or the like (in fact it can’t be done). It’s all handled by the stylesheets (XSLT and CSS), so the content is flexible to be published on any output channel, and authors can focus on great technical writing.

Should you need to change it, or apply different styling in different publishing channels, it’s easy to do so in one place—the styling layer. When content and layout is truly separated in a proper structured authoring tool, you don’t have to go and change your content in hundreds or thousands of places.

Help Authoring Tools Have the Same Problems as Word Processors and HTML

You may be thinking, do I really need to use an XML-based authoring tool (with or without a component content management system – CCMS) to achieve this? After all, there are so-called help authoring tools, or “HAT tools” that also aim to provide single-sourcing of content.

Unfortunately, help authoring tools are not actually XML-based, but HTML-based. You may have seen the term “the XML editor” in such tools, but in fact they use HTML in the source (even if it’s a slightly more structured variant of it, called XHTML, it is still HTML).

These help authoring tools, in order to work with single-sourcing at all, need to provide some way of making the content more semantic, or they would be severely limited. They do provide semantic options—not with actual rich semantic XML tags, but by using class names to emulate XML semantic tagging. So for instance, instead of having an element called “note”, you would have a regular p (paragraph) tag, with a class attribute with the value “note”.

Now, classes in HTML do not have any restrictions on what values you give them. So you are free to invent any new markup you want or need. So if you need a code element, you just invent it by using an existing HTML element, and then add a class with the value “Code”.

Then you, or one of your colleagues, get the idea that another code element is needed, so someone just invents “Code_1”.

And then you notice that you had made paragraphs where you really wanted steps. So you add a class to make this a special kind of step, “StepA”.

Ok, so what is the problem with that? Then you can make up all the markup you need, right?

Sure. But what you have then is too much freedom. In effect, you have ended up in the same situation you encountered with the word processor. You just add whatever you want to make it look like you want it, at this particular moment in time.

In this example, elements marked up as “Code”, “Code_1”, “StepsA” have become intermixed and are not governed by any consistent rules. And to top it off, the indentation for the paragraphs that are to act as code elements is hardcoded right into the element by a style attribute.

By the way, this is a real-life example from a well-known Help Authoring Tool (HAT), only the text has been anonymised. And it is not an unusual example from such unstructured authoring environments, on the contrary, it is the rule rather than the exception.

To show the similarity of the situation in the word processor again, here is an example from the same help authoring tool where the writer wanted some extra spacing. Look familiar?

Now imagine a team of authors, all with this freedom to invent new tags every day… You are looking at tagging anarchy, and will be in serious trouble very soon with technical documentation that is a nightmare to maintain, not future-proof, and very difficult to process to different outputs, because there is no consistency and very few rules.

Remember that house of cards?

So in short, the same problem happens in HTML-based authoring tools as in the word processor: very specific styling is applied inside the document in the authoring process itself, instead of separating content from layout.

The Solution – Use a Proper XML-Based Structured Authoring Tool

Using a word processor or a help authoring tool may seem like an easy way to author content. It may feel like it’s very easy to produce content quickly because the styling is easy to achieve, and directly by the technical writer. But that is exactly the problem. It is still too tied to the layout, and it’s a very short-sighted perspective.

To create robust, maintainable, and future-proof technical documentation, content must be authored with genuine separation of content and layout, and truly XML-based authoring environments like Paligo offer a rich content model with all the semantics to describe your content properly, and then let the styling layer take care of the rest.

Once you get into this mindset, it becomes second nature. You can focus on creating great content, and you (or your design team) can easily do the styling in a separate layer. And you can change it at any time, adapt it to different publishing channels, and make it future proof.