Usamos cookies y otras tecnologias similares (Cookies) para mejorar su experiencia y proporcionarle contenido y anuncios relevantes para usted. Al utilizar nuestro sitio web, acepta el uso de Cookies. Puede cambiar su configuracion en cualquier momento. Politica de cookies.


What a Data Warehouse is Not

Originally published octubre 29, 2009

Some topics just never seem to die, however much their demise is deserved. When you think you have heard the last of something, here it comes again, just like a bad penny.

Recently, I was at a conference, and I heard the following discussion about what a data warehouse was. One person suggested that a data warehouse was really all the old legacy systems connected by software that could access the data. By calling such a contraption a data warehouse, the organization could avoid having to do the hard and complex work of integration.

There are so many problems with this federated approach to a data warehouse that they are almost not worth repeating here. But (once again!) here goes.

A federated data warehouse:

  • Has terrible performance problems,

  • Has no integration of data,

  • Has no reliable source of historical data,

  • Requires the cooperation of Armonk, Larry Ellison and Bill Gates, among others,

  • Has no repeatability of processing, and so forth.

A federated data warehouse is no data warehouse at all.

Another person suggested that a data mart was a data warehouse. In this case, it was suggested that an organization build a data mart for finance. Then, the data mart could be expanded with new requirements for marketing. Then sales could add on, and so forth.

The problem with this solution is that the requirements for data as found in a data mart vary considerably from one department to the next. Adding sales data to finance data cannot be done without restructuring data back down to its most basic level and rebuilding the structure. At this point, it would have been easier to just build a data warehouse in the first place.

Stated differently, a data mart has one set of DNA and a data warehouse has another set of DNA. Setting seeds for a tumbleweed in the ground, watching the seeds grow, then calling the plant an oak tree does not make the plant an oak tree. The DNA for a tumbleweed and an oak tree are as different as can be.

So a data warehouse is not a data mart, just as a federated data warehouse is not a data warehouse.

Data warehouses exist for the purpose of supporting management, not operations. As such, an active data warehouse is not a data warehouse. Doing transaction processing and up-to-the-second transactions is not what a data warehouse is. Management does not need or even care about detailed, up-to-the-second accurate transactions in order to make decisions. It is the clerical community that cares about these kinds of decisions. And a data warehouse does not support the clerical community. The data warehouse supports the managerial community.

So an active data warehouse is also not a data warehouse.

A data warehouse is not a dimensional database, where there is a star structure and fact tables. Star structures and fact tables are designed to optimize the access and analysis of a single group of users and a single set of requirements. As long as the users do not change and the requirements do not change, everything is fine. The problem is that over time, users do change and requirements do change. That is the way of the world. And when requirements change, the star schema and the fact tables need to undergo change.

A much more rational way to build the data warehouse is to use the relational model. The relational model is able to handle change as gracefully as change can be handled. In addition, the relational model is so granular and basic that it is not optimized for any user at the expense of any other user.

And last but not least, you do not buy a technology and have a data warehouse. Instead, you design and build the proper structure, and then you seek out the best technology to help you access and analyze the data. Most vendors that offer to sell you a data warehouse are pulling your leg.

So here is a short list of what a data warehouse is not:

  • A federation of data,

  • A data mart that can be grown into a data warehouse,

  • An active data warehouse, or

  • A dimensional table with star schemas and fact tables.

SOURCE: What a Data Warehouse is Not

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon


Related Stories


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!