How Structured Authoring Delivers AI-Ready Content in the Age of Generative AI

September 17, 2024
Share
image shows two women working in product knowledge management

Many companies seek to use generative AI to provide more relevant, contextual experiences to customers through tools like knowledge-based search, conversational chat, and customer self-service. Why? A recent Salesforce study showed that 84% of companies believe generative AI can help them better serve their customers, and 75% think it will help them do that faster. As a result, AI is becoming increasingly important to technical writers and communicators, transforming how they work. After all, product documentation is a key component of all those tools just mentioned (and many more). If you aren’t already working with generative AI in some capacity, it’s only a matter of time. 

Much of today’s attention is focused on leveraging generative AI to improve technical writing. But that’s only one part of why you need to understand it. There is just as much, if not more, to understand about how you write content for AI. It’s not as simple as pointing AI at your product content and letting it go. You need to understand how generative AI and large language models work and how the way you create your product documentation impacts the efficiency of AI applications.

There is much more to learn about writing for generative AI rather than writing with it. And that’s what we want to talk about in this piece.

Writing with AI is not the same as writing for AI – here’s the difference

A good deal of the conversation around generative AI relates to using it to help you write content. And that’s fine. There are many AI tools that help writers create content, from drafting outlines to writing full articles or documents, creating taxonomy and metadata, and improving SEO. But that’s not what we want to talk about here.

Writing with AI is not the same as writing for AI. When you write for AI, you write content to make it easier and faster for large language models to ingest and understand it. Writing for AI is particularly important for technical documentation teams and technical writers to understand because it impacts how you create and manage your content. 

But before we get into the how of writing content for AI, let’s step back and look at how generative AI and large language models work.

Boosting Productivity with a Documentation Platform

The Paligo documentation platform is designed to facilitate the creation, organization, storage, retrieval, and publishing of technical documentation.

Read more

How AI “Reads” Content – How LLMs and Probability Work

Generative AI is a branch of artificial intelligence that includes computer models that can generate content, including text, images, and videos. In the case of generating text-based content, large language models (LLMs) are employed. You have probably heard of OpenAI’s GPT, the most well-known LLM available. Others include Anthropic’s Claude, Google’s Gemini, and Meta’s LLaMA. 

LLMs are built on a type of neural network called a transformer model. They are pre-trained, with no human input, by ingesting large datasets such as the public internet. The text is broken down into words or characters called tokens, which the model then processes and analyzes to understand how the words function together. The model learns the meaning of each word, identifying patterns and grammar. It also learns the relationships between words and the context of a word (e.g., the difference between ‘right’ and ‘write’). The result is a neural network that is similar to how the human brain works. 

Pre-training an LLM is just the beginning. It then needs to be tuned to understand specific domains (e.g., industry context) or tasks. There are several approaches to tuning, including fine-tuning, prompting tuning, and RAG (Retrieval Augmented Generation). For example, a healthcare organization would want to train an LLM on healthcare-specific terminology and functions so that it fully understands how to respond to health-related prompts.

Once the LLM is trained, a person can enter a prompt, which can be a question or a request for the LLM to create some content. The LLM returns a response by predicting the probability that a certain word will come next in a sentence. The quality and coherence of the generated response depends on how well the model has learned to estimate these probabilities. Higher probabilities mean the LLM is more confident in its predictions, while low probabilities tend to indicate areas where the model lacks knowledge and may return more random or creative responses (and, in some cases, completely wrong responses). 

If you want to apply some control to whether the LLM’s output is based on high probabilities or low ones, you set a parameter called temperature. A high temperature results in lower probability output, whereas a low temperature results in high probability output. Most LLMs are set at a default setting, which is a balance of the two.

If it all sounds a bit confusing, you aren’t alone. It can be hard to wrap your head around how it works, especially considering no one fully understands how an LLM analyzes and processes information.

Why Unstructured Content is Less Suitable for AI Ingestion

LLMs are not like search engines. They don’t index large amounts of content and then return results based on keywords or phrases from that index. LLMs are text generators that try to understand what you are asking them so they can provide you with the best answer. The response is brand new content.  

This difference is important to understand because LLMs are only as reliable as the content on which they are trained. Where that content comes from affects reliability. But what can also affect reliability is how content is structured. 

Unstructured content is typically less reliable than structured content. 

Why is it less reliable? There are a few reasons:

  1. Unstructured content requires more resources to process because of the sheer volume of the content, which can result in inconsistent analysis and lower-quality data.
  2. It lacks a consistent format, making it harder for an LLM to find and extract relevant information or identify relationships between information. 
  3. The natural language used in documents and other types of unstructured content (like emails, audio and video transcripts, and social posts) makes it harder for the LLM to pull the right information accurately. It can also contain irrelevant or ambiguous information that affects the quality of the model.
  4. It often contains many types of content, including images, tables, and scanned documents, that require special processing. 
  5. It doesn’t contain metadata, nor is it connected to a taxonomy.
  6. It can contain biased language.
  7. There are issues around how to prevent people from accessing secure information.

Another major challenge is that LLMs can only process so much data at a time, so they have to chunk the content to process it. Leaving the LLM to chunk the data itself can result in the information getting processed inaccurately because important information may get separated.

These challenges can be overcome but require a lot of work. Unstructured content needs to be converted into a more easily understandable format, cleaned, and chunked properly to enable the LLM to process it with a high degree of accuracy.

image shows person working on SOP documentation

How Structured Authoring Helps Optimize Content for AI Ingestion

So, if unstructured content is challenging for LLMs, the answer must be to create structured content, right? While it’s not the complete answer, structured content can significantly improve the accuracy and reliability of an LLM’s responses. Here’s why:

Efficient Learning of Patterns

Structured authoring involves breaking content into topics or sections that are self-contained. Each topic might contain a heading, subheadings, and paragraphs. Topics adhere to predefined formats and standards, ensuring higher data quality and improved contextual understanding. 

These consistent structures and explicit connections make it easier for LLMs to recognize and learn patterns, accelerating the training process and improving the model’s performance in summarization, translation, and question-answering tasks. LLMs rely on patterns in data to make predictions, so structured content’s consistent formatting leads to better predictions and fewer errors.

Improved Pattern Learning with Taxonomy and Metadata

A key component of structured authoring is enriching the content with metadata and applying taxonomy. Metadata and controlled taxonomies help identify key concepts, entities, and relationships. LLMs use taxonomy and metadata to enforce pattern learning and improve tasks such as categorization and topic modeling.

Enhanced Understanding of Relationships

If an LLM has to chunk content to ingest it, structured content automatically provides that chunking, ensuring that the content related to a topic is kept together. An LLM ingests chunks sequentially, making relationships between different content elements clear. For example, sections, subsections, and headings indicate the hierarchy and context, which LLMs can use to generate more contextually appropriate responses.

Improved Data Understanding

Structured content follows a defined format (e.g., XML, JSON), making it easier for LLMs to parse and understand the relationships between pieces of data. LLMs can quickly identify entities, attributes, and metadata, allowing for more accurate content processing.

Optimized Resource Utilization

Processing structured content often requires less computational power compared to unstructured data. This efficiency can lead to faster training times and reduced resource consumption.

Consistency in Response Generation

Structured content drives more consistent and reliable outputs. This consistency is crucial for applications where accuracy and reliability are paramount, such as in healthcare or legal domains. LLMs that rely on unstructured content tend to be more creative in their responses, especially when they don’t know the correct answer. This creativity is less prominent when structured content is processed.

Better Integration with Knowledge Bases

Structured content can be seamlessly integrated with external databases and ontologies. This integration enriches the LLM’s knowledge base and enhances its ability to provide informed responses. For example, when you combine RAG with structured content in an LLM, you can deliver more context-aware and accurate content.

Understanding Domain-Specific Context

The fine-tuning discussed above ensures LLMs understand domain-specific language (e.g., finance, healthcare, manufacturing, medical device). Pre-defined domain-specific taxonomies and metadata are essential to helping LLMs process information accurately. 

When dealing with AI applications that support industries such as healthcare, finance, and medical equipment, it’s easy to see why using structured content is so important. But it’s equally important for any company that wants to create a great customer experience. Customer self-service, customer portals, and contact center chat are all experiences that must provide relevant and accurate information to customers. 

As you can see, structured content provides a foundation that supports more efficient learning and better performance by LLMs. Precise, organized, high-quality data enhances an LLM model’s ability to understand context, extract relevant information, and generate accurate, coherent responses.

Preparing and Delivering Content for Successful AI Deployment

Learn more in this deep-dive webinar.

Watch now

How Using a CCMS Supports Structured Authoring

By now, you know that a component content management system (CCMS) uses a structured or topic-based authoring approach to content creation and management. With structured authoring, you create topics that are self-contained pieces of information that can be reused across publications. Topics provide a consistent tone, style, and terminology across all your documentation, which means an LLM can quickly process and analyze it. 

Along with structured authoring, a CCMS supports creating and applying taxonomy and metadata to your content, providing the context LLMs need to better understand the content. 

But there’s more to consider when deciding if you need a CCMS.

A Unified Knowledge Base

Unified knowledge is critical for AI because it offers a comprehensive understanding of the information in a single location. The LLM doesn’t need to process multiple disparate data sources because everything is in one place. A CCMS is your single source for all technical and product documentation in a standardized format, using clear and concise language and with detailed taxonomy and metadata. 

Multimodal Data Integration

Most product documentation is comprised of more than text-based content. It also includes images, screenshots, videos, and other formats. For example, instruction-for-use for a drug often includes images along with text-based instructions. By using structured content in a CCMS, the different content formats are interlinked, enabling an LLM to understand the context of each and how they relate to each other. 

Regulatory Compliance

Structured content is often required for regulatory compliance in industries such as financial services, pharmaceuticals, aerospace, and manufacturing. When you use a CCMS, you ensure your documentation supports regulatory requirements by leveraging features such as audit trails, metadata, and content versioning. As content is updated, new versions are created, and the LLM should be updated accordingly. 

Content management systems that work with unstructured content have a harder time ensuring compliance because they lack the ability to organize and track updates consistently.

Ethical Considerations

One of the most important things to consider when using structured content over unstructured content is bias in the information. Unstructured content can contain biased or misleading information that is difficult for an LLM to identify and filter out during training or fine-tuning. 

Structured content, on the other hand, enforces better control over the quality of the information because writers must follow a standardized content strategy, including proper content labeling. This standardized approach mitigates potential bias. 

There are also better review workflows. A CCMS allows for more thorough review procedures to be put in place reducing the risk of incorrect information being published.

Tools and Learning Resources to Help Get Your Product Knowledge AI-Ready

There is so much to understand and learn about how generative AI and AI applications will offer new ways for customers to access your product knowledge. We’ve covered the basics here, but to help you learn more, here are some tools and learning resources to keep you moving forward.

Tools

Content management and delivery tools help you create, manage, and deliver AI-ready content. A CCMS like Paligo provides an environment to create and manage your content in a structured content model, complete with taxonomy and metadata.

Content delivery platforms, like Zoomin and FluidTopics, bring together content from multiple sources (CCMS, LMS, CRM, LLMs, etc) to enable you to deliver AI-infused customer self-service experiences.

You can also leverage AI-powered tools to help you automate and enhance structured content creation. For example, Acrolinx uses AI to ensure content quality by checking grammar, consistency, style, and adherence to product-specific terminology. It helps teams create clear, consistent, optimized content for AI consumption.

Structured Content and AI-Related Resources

To prepare product knowledge for AI, your team needs to understand structured content, AI technologies, and how to create content that is easy for LLMs to process. These learning resources help build that knowledge:

Structured Authoring Courses

  • The Center for Information-Development Management (CIDM): Offers courses, webinars, and resources on structured authoring, and content management. It’s a valuable resource for teams adopting structured content practices to make their documentation AI-ready and learn more about generative AI for technical documentation.
  • If you are using the Paligo CCMS, check out the Paligo Academy for tutorials and overviews to help you develop structured content.
  • Technical Communication Body of Knowledge (TCBOK): TCBOK offers comprehensive learning materials and guides for technical writers, including how to use structured content and metadata to improve documentation.

Understanding Generative AI

Leverage Generative AI for Technical Writing

Although not the topic of this piece, it’s still important to understand how generative AI can help you improve your technical writing skills:

  • CherryLeaf offers an online course on “Using Generative AI in Technical Writing.” This course helps technical communicators leverage generative AI for more efficient technical authoring.
  • AI for Technical Writers from Complete AI Training is another course that will help you improve technical content through the use of AI.

The Wrap

As generative AI continues to evolve, technical writers must adapt their process to writing for AI, not just with AI. The distinction lies in creating structured content that large language models (LLMs) can easily ingest, process, and generate accurate responses from. Structured authoring, enriched with taxonomy and metadata, enables AI to deliver more reliable and contextually relevant information. 

By adopting tools like a CCMS and understanding the principles behind LLMs, technical documentation teams can future-proof their content for AI-driven applications, ensuring enhanced customer experiences and compliance across industries.

Get started with Paligo

Paligo is built to meet the most demanding requirements, with plans made for any company from the growing SMB to the large Enterprise.

Book a demo
Share