Usamos cookies y otras tecnologias similares (Cookies) para mejorar su experiencia y proporcionarle contenido y anuncios relevantes para usted. Al utilizar nuestro sitio web, acepta el uso de Cookies. Puede cambiar su configuracion en cualquier momento. Politica de cookies.

Calculations Made Outside of the Data Warehouse

Originally published diciembre 18, 2008

For the past decade or so, I have had the honor and privilege of helping people around the world with their data warehouses. The work that it has been my privilege to do has ranged from initial design to operations to monitoring to ease of and speed of end user access. In addition, I have seen data warehouses around the world. A recent check of my passport showed that work has been done in 56 countries.

Yet, just when you think you have seen it all, something new crops up. A jackrabbit jumps out of a hole in the ground right in front of you.

The other day, I was consulting at a bank. We were looking at their data warehouse when something different cropped up. The bank was loading data into their data warehouse. Then, once a month, they were reading their data warehouse and recalculating a figure. The figure was a risk rating for each loan customer. Each month, they recalculated the bank’s assessment of the likelihood of a loan being repaid.

At first glance, this seems to violate the cardinal rule of a data warehouse never being updated. After all, at least one field is being read, recalculated, and written over.

But further investigation showed that perhaps this practice was OK.

From a business standpoint, it was necessary to have a very current assessment of risk. Recalculation on a monthly basis was a genuine business requirement. Secondly, the criteria used to assess risk were changing all the time. Over a year’s time, there was considerable change in the basic risk calculation. Each change in the basic risk calculation invalidated the previous month’s calculation. So, once again, there was a business requirement for a recalculation.

A third factor was that the bank had been operating their risk assessment this way for many years. This meant that when a bank officer examined this month’s record, there was no cry of, “My data does not match last month.”

A fourth factor was that the calculation was done for all loans. While it is true that the assessment value may stay the same for a given loan throughout a stretch of months, the calculation will have been done differently each month.

So from all of these standpoints, the recalculation of data within the data warehouse seemed like an OK thing to do.

There was one nagging question that arose. That question was, “Is the bank ever going to need to compare one month’s set of calculations against a previous month’s set of calculations?” Was there ever going to be a need to look at this month’s data compared to the same month a year ago, for example?

The bank officers assured me that such a calculation was never done. And as long as a calculation of this variety is never done, then there is no need to store a historical record of each month’s calculations. But had such a comparison needed to be done (or even if someone thought that it might need to be done sometime in the future), it would have been simple enough to store the historical record of each month’s calculation.

Thus it was that the good people of the bank got me to agree that violation of one of the basic tenets of a data warehouse was OK.

Does all of this mean that the rules for data warehouse design really aren’t cast in stone? The answer is that people violate the rules of good data warehouse design all the time. And, in each case, something is gained and something is lost.

The point of good design is to make sure you know exactly what is gained and what is lost. As long as the tradeoff is well thought out, as long as the tradeoff has both sides of the equation considered, then the decision to violate the rules of data warehouse design may be OK.

All too often, people make design decisions that violate the good practices of data warehouse design without having a clue that a tradeoff is being made. All too often, people make design decisions looking at just one side of the coin. And this is where people get in trouble.

One such decision – for example – is the conscious decision to do wholesale transaction processing in a data warehouse. From the standpoint of technology, can such a design decision be accommodated? The answer is yes, there are technologies that will allow you to update a data warehouse. And this may seem to be a dandy thing to do.

But the downside of doing wholesale transaction processing in a data warehouse is quite large. When you do wholesale update in a data warehouse:

  • Your capacity requirements jump dramatically, making the cost of the data warehouse excessive.

  • You lose your capacity to have integrity of data. A report run one minute may be different from the same report run a few minutes later. This violates one of the basic features of a data warehouse.

  • You interject a whole new level of complexity when it comes to backup and recovery of data and transactions.

  • You introduce the very real possibility that online performance is going to be – at best – spotty.

  • You introduce the very real possibility that there are going to be major periods of database downtime due to reorganization, recovery, restructuring, accommodation of statistical analysis, and so forth.

  • You make the job of the database designer really difficult. A database can be optimized for business requirements or a database can be optimized for long-term flexibility, but it is almost impossible to optimize a single database design for both purposes at the same time.

And the list goes on.

Changing the basic tenets of good data warehouse design must be done carefully and with full inspection of both the positive and the negative cases of the database design tradeoffs.

SOURCE: Calculations Made Outside of the Data Warehouse

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

Related Stories



Want to post a comment? Login or become a member today!

Be the first to comment!