Usamos cookies y otras tecnologias similares (Cookies) para mejorar su experiencia y proporcionarle contenido y anuncios relevantes para usted. Al utilizar nuestro sitio web, acepta el uso de Cookies. Puede cambiar su configuracion en cualquier momento. Politica de cookies.

KPIs and Unstructured Data

Originally published mayo 21, 2009

In many ways, KPIs are the real payoff for decision support and analytical processing. KPIs – key performance indicators – are those periodic calculations made by the organization to determine the progress or lack of progress of the organization. There are different KPIs for every company. Some typical KPIs include:

  • cash on hand,
  • sales pipeline,
  • recurring revenue,
  • number of employees,
  • revenue per employee, and
  • book to bill ratio.

KPIs are typically generated on a periodic basis – once a week, once a month, once a quarter, and so on. Management can conveniently compare the KPIs over time in order to tell whether the company is moving in a positive direction and how fast.

Depending on the company, KPIs change over time. If a company is in a state of gathering market share, then KPIs typically concern themselves with the number of new customers, customer retention rate, customer attrition, new product acceptance, and so forth. If a company is in a state of focusing on profitability, KPIs typically center around cash flow, expenses, revenue, profitability, and so forth. If a company is focusing on new product introduction, KPIs typically center around promotions, advertisements, shipments, customer feedback, and so forth.

Generally speaking, KPIs are found in data marts. The KPI is calculated from detailed data found in the data warehouse. The granular data warehouse data is accessed, refined, organized and calculated. Then, the KPI is stored in the data mart. The process of data mart restoration occurs as frequently as it makes sense to recalculate the KPI.

KPIs then are one of the cornerstones of the analytical, decision support environment.

But the issue of KPIs poses some real issues and challenges once unstructured data is introduced into the data warehouse. The fundamental problem is that KPIs are designed for structured, numerical data. People that build and use KPIs take for granted that the KPIs can be periodically recalculated, and this recalculation is absolutely true for repetitively occurring structured data. Structured data is often based on transactions. The bank has an activity. An insurance company settles a claim. A manufacturer fulfills an order.

These regularly occurring business activities are easy to use as a basis for calculating a KPI. But when it comes to calculating KPIs based on unstructured data, such is not the case at all. Some unstructured data is mildly repetitive and other unstructured data is not repetitive at all.

For example, consider patents. There is very little repetition between patents. Perhaps the name of the inventor and to whom the patent is assigned are regularly occurring units of information. But other than that, there is very little that is commonly recurring among patents.

Now consider contracts. If a corporation considers classes of contracts, there is probably some degree of repetition between contracts. But if a corporation considers contracts in general, then there is very little repetition between contracts. One contract is a sales contract. Another contract is an employment contract. Another contract is a shipment contract, and so forth. There just isn’t a great deal of repetition between these types of contracts.
Now consider news articles. Over time, there is practically no repetition among articles. For a short amount of time, as a news event occurs, there may be some amount of repetition among articles. But as time passes, news events change and the focus of the media changes as well. Looking over a spectrum of time, there is very little repetition at all in news articles and news coverage.

Indeed, there are many kinds of unstructured data. Another very common kind of unstructured data is email. In email, there is a high degree of repetition. Some customers email to complain. Some write emails to congratulate. Other emails ask for instruction or guidance. But all in all, the different emails have a fairly high degree of repetition.

Another type of unstructured data is spreadsheets. There is actually a fairly high degree of repetition among spreadsheets. But the problem with spreadsheets is that even when the same word is found on two or more spreadsheets, there is no guarantee that the word means the same thing. Therefore, when it comes to repetition among spreadsheets, repetition may be a very difficult thing to measure. It may even be dangerous to measure repetition among spreadsheets.

There are many different forms of unstructured data. Some forms of unstructured data have a high degree of repetition and other forms of unstructured data have little or no repetition at all. And it is only where there is repetition of text that it is even possible to have a KPI. And even when there is a possible KPI for repetitive textual data, it usually represents a very simple perspective of data. For example, the KPI may indicate how many complaints were received, or how many customers responded to a promotion, or how many ads there were from the competition.

In a classical analytical environment, a KPI is good for measuring numerical information that is repetitive. But when it comes to non repetitive information or textual information, a KPI is much less of an effective or even an appropriate measurement.

An interesting implication of the inapplicability of KPIs to unstructured data is that of whether unstructured data supports data marts. If a data mart is made up mostly of KPIs, and if KPIs don’t apply to textual data, then it may be true that data marts are not easy to build or even appropriate for unstructured analytical processing.

The world of analytical processing is so new that the jury is still out on this issue. As with all things, time will tell.

SOURCE: KPIs and Unstructured Data

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

Related Stories



Want to post a comment? Login or become a member today!

Be the first to comment!