Usamos cookies y otras tecnologias similares (Cookies) para mejorar su experiencia y proporcionarle contenido y anuncios relevantes para usted. Al utilizar nuestro sitio web, acepta el uso de Cookies. Puede cambiar su configuracion en cualquier momento. Politica de cookies.


The Elusive Virtual Data Warehouse

Originally published marzo 19, 2009

Have you ever been to a fair with the game where a mechanical gopher pops out of a hole and it is your job to whack it when it appears? Once you whack the gopher, it is only a short amount of time before it reemerges from another hole. This is a good game for 4-year-olds, but it is very frustrating for adults.

A virtual data warehouse is like this carnival game. I believe virtual data warehouses are inane. Just when you think this incredibly inane idea has died and just when someone has delivered it what should have been a deathly blow, out it pops again from another hole. The virtual data warehouse just won’t die, no matter how hard or how many times it gets whacked.

The great appeal of the virtual data warehouse is that you do not have to face the problem of integration. For whatever reason, organizations DREAD integrating their data. Managers lay awake at night in a cold sweat just thinking about the awful work of integrating their corporate data. They would rather put their head in a moving cement mixer than integrate their data.

So there is definitely an appeal to the virtual data warehouse. It is an easy way out.

(For the uninitiated, a virtual data warehouse occurs when a query runs around to a lot of databases and does a distributed query. With a distributed query, the analyst does not have to face the dreadful task of integrating data.)

Why then is the virtual data warehouse such a supremely bad idea? There are actually lots of reasons for the vacuity of virtue manifested by the virtual data warehouse. Some of those reasons are:

  • A query that has to access a lot of databases simultaneously uses a lot of system resources. In the best of circumstances, query performance is a real problem.

  • A query that has to access a lot of databases simultaneously requires resources every time it is executed. If the query is run many times at all, the system overhead is very steep.

  • A query that has to access a lot of databases simultaneously is stopped dead in its tracks when it runs across a database that is down or otherwise unavailable.

  • A query that has to access a lot of databases simultaneously shuffles a lot of data around the system that otherwise would not need to be moved. The impact on the network can become very burdensome.

  • A query that has to access a lot of databases simultaneously is limited to the data found in the databases. If there is only a limited amount of historical data in the databases, the query is limited to whatever historical data is found there. For a variety of reasons, many application databases do not have much historical data to begin with.

But there are deeper reasons why a query that has to access a lot of databases simultaneously has some major architectural problems. Those problems stem from the integration of data. In order to illustrate the problems of the integration of data, suppose there are three applications. Application A is written for Australians. It has Australian dollars and centimeters. Application B is written for Americans. It has American dollars and measurements made in feet and inches. Application C is written for Canadians. It has Canadian dollars and measurements made in millimeters.

So what happens when there is a query that has to access a lot of databases simultaneously? Guess what the query has to do every time it accesses application A, B, and C? The analyst must integrate the data. If you want to have meaning, you cannot add Australian dollars to American dollars to Canadian dollars without taking into account their fiscal differences. So integration MUST be done before the data can be useful and meaningful.

The same goes with adding together centimeters, inches and millimeters. You cannot meaningfully add these numbers without accounting for the differences in the measurement of the values. And there are literally hundreds of like conversions that must be made.

The ugly truth is that the corporate analyst MUST do integration whether he or she wants to or not. There is no getting around it. Wishful thinking just does not cut it here.

And – as if there were not enough obstacles – there is the issue of compatible integration. One analyst integrates the data from A, B, and C and arrives at one conclusion. Another analyst integrates data from A, B and C and uses a different algorithm for integration. Now the corporation has two sets of values and no way to reconcile these values. This problem is exacerbated by the fact that EVERY analyst must make his or her own integration algorithms, and there is no guarantee that any two analysts use the same formula or algorithm. When every analyst is free to integrate data as he or she sees fit, then there is no integrity of data. There is no single corporate system of record – there is no corporate single version of the truth.

So the problems with the virtual data warehouse are legion. There is poor performance or no performance at all. There is an enormous system overhead. There is only a limited amount of historical data. There is the work of integration that each analyst must do, one way or the other. There is no reconcilability of data. There is no single version of the truth for the corporation.

The irony is that the analysts in the corporation must integrate its data repeatedly, rather than once.

So if these problems are not an issue for you and your corporation, go right ahead and build your virtual data warehouse. The person that follows you in your job probably will have a more in-depth understanding of the issues.

SOURCE: The Elusive Virtual Data Warehouse

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon


Related Stories


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!