Taking Inventory of the Unstructured World

Originally published julio 1, 2010

In most companies, there is a wealth of unstructured textual information. There are documents of many kinds found in many places. There are reports. There are articles. There are spreadsheets. There are contracts. Intuitively, the organization knows that it ought to be doing something with these documents. Trying to find a document six months after it has been written is no small task. Trying to gather documents for a cost justification or for litigation support is not trivial. Yet documents are like small minnows in the water. They keep multiplying, and they are slippery.

Trying to manage corporate documents is like trying to catch the wind. Most corporations have never even attempted to try to manage their corporate documents. Yet some of the corporation's most valuable information is found in documents.

Not all corporate documents need to be managed. Many informal documents and presentations do not warrant the attention of management, but many documents do need management. Many corporate documents represent official pronouncements and statements of obligations and expectations by the corporation.

A good first start for an organization to proactively manage its documents is to create a corporate document inventory. In creating an inventory, the organization looks at and catalogs its existing documents. In some organizations, there are literally hundreds of thousands of documents. Building a “card catalog” of the documents that belong to the organization is an excellent start to managing the corporate collection of documents.

Libraries have long used card catalogs to great effect. Libraries know that looking through an entire library with all of its books is a colossal waste of time. Realistically, if it were not for the card catalog, libraries would not be in existence. When a person is looking for a book in the library, the most efficient way to look for the book is to use the online card catalog. With the card catalog, the reader can quickly scan through all the possibilities. Upon finding the one or two books that look the most promising, the online card catalog then provides the reader with directions to the location of the book. And it is no different with the documents that belong to the corporation.

What should an inventory of corporate documents—a corporate card catalog—contain? Some of the likely contents of the corporate card catalog should be:

  • A title or brief description of the document,
  • A measurement of the size of the document,
  • The date the document was created,
  • The date the document was last changed,
  • The date the document was last accessed,
  • The system path of the document, and
  • A classification of the document type.
All of these components of the card catalog are useful. Indeed, some of the elements of the card catalog are found in the metadata of the document. But not all card catalog elements are found in the metadata of the document. Perhaps the most useful of the card catalog elements is the document classification.

Documents can be classified in many ways. Consider an oil company. The business of the oil company can be roughly divided into the sectors of “upstream,” “midstream” and “downstream.” Upstream refers to the process of exploration. Midstream refers to the process of refining and pipeline. Downstream refers to the process of distribution. Each document that belongs to the oil company can be read, and the document can be classified as to which general category of information the document refers. The document can be an “upstream” document, a “midstream” document, or a “downstream” document. Or, consider manufacturing. In manufacturing, there is the process of handling raw goods, assembly, managing work in process, finishing a product, and shipping or storing the product. Documents for manufacturers can be classified as to which aspect of manufacturing the document best applies.

Classifying the content of the document is a jump start for the analyst looking through the many documents that belong to the corporation.

Creating the inventory of corporate documents is an activity that represents the first start in managing the unstructured environment. Without a corporate document inventory, the world of unstructured data is a massive blob of ambiguity.

SOURCE: Taking Inventory of the Unstructured World

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon


Related Stories


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!