Using Unstructured Business Content in Business Intelligence
por Colin White
Originally published abril 25, 2007
While numbers differ about how much unstructured and semi-structured business content exists in organizations, there is no question that it far exceeds the amount of structured business data managed by today’s data warehousing systems. To date, there has been limited interest in the use of business content in business intelligence (BI) and data warehousing, but this is beginning to change. The rapid growth in business content generated by web and collaborative applications coupled with technology improvements in areas such as search and text mining have created a situation where business content can now be used to extend the analytical power and decision support capabilities of business intelligence.
In this article, I look at the different types of business content that exist in organizations and their value in the decision-making process. I also review different approaches to accessing and analyzing this content in a BI environment. For simplicity, I use the term business content to refer to unstructured and semi-structured business information, and the term business data to refer to structured business data.
Types of Business Content
Unstructured and semi-structured information varies widely in both format and content. Unstructured information, for example, includes rich media files containing audio and video, and text files containing electronic forms, reports and web pages. All of these file types may contain useful business information. Audio file recordings of conversations between customers and support center staff provide valuable insight into the efficiency of support staff and about customers’ views concerning products and services. Similarly, electronic forms used by support staff may also contain information about customer attitudes and viewpoints. Product review web logs (blogs) offer valuable feedback on the acceptance of new products in the market, whereas product and services websites contain competitive pricing on everything from books and DVDs to airline fares.
A review of semi-structured information shows that a high percentage of this type of information is in an XML format. Tags in XML files provide some semantic information about the contents of the file. There are also an increasing number of industry XML vocabularies, or metamodels, that add additional semantics to XML documents. An example here is XBRL for reporting financial information. XML is becoming the standard approach for exchanging information between systems and between companies.
Examples of applications that can analyze unstructured and semi-structured content and thus enhance BI processing include customer and market intelligence, pricing optimization, customer sentiment and complaint analysis, product safety and quality analysis, regulatory compliance, legal discovery, fraud detection, financial analysis, and IP protection.
Processing Business Content
There are six main approaches to accessing and using business content in a business intelligence environment:
These six approaches are illustrated in Figure 1. (The difference between managed content and unmanaged content in the diagram is that managed content is usually maintained by a content management system and subject to governance procedures, whereas unmanaged content exists in standalone files and databases.) With all six approaches, it is important to understand the capture, transformation, and delivery techniques that are supported and used.
The Impact of Business Content on Business Intelligence
Recent articles by Colin White
Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC