Data Warehouse Project vs Any Other Large Database Implementation



How similar or different do you think a data warehouse project might be compared to any other large database implementation?

First, it must be realized that the purpose of warehouses are different than regular databases; the former is used in OLAP and the latter in OLTP. Due to the decision-support and "business needs" requirements of warehouses, the scope is, usually, much larger than large (regular) database implementations such as a RDBMS. The warehouse project will need a lot more input/involvement form the upper management and other key decision makers. Data warehouses often require the integration of heterogeneous databases (and/or data sources). To integrate such data, and provide easy and efficient access to it, often proves to be a challenge. This is quite a different scenario than regular database projects -- no matter what their size.

The build time for an average DW is, normally, longer than traditional databases projects, large or small. In addition, the cost of a typical DW project is also quite a bit higher than traditional databases projects, large or small. Some reasons for this are:

Preprocessing of data: You are going to spend a lot of time extracting, cleaning, and loading data. The usual figure quoted is that 80% of the time building a data warehouse will be spent on this type of work.

Despite best efforts at project management, data warehousing project scope will increase. Data warehousing projects start with data and end with requirements. Once warehouse users see what they can do with 2001's technology, they will want much more!!

You are going to find problems with systems feeding the data warehouse. Problems that have gone undetected for years will pop up. You are going to have to make a decision on whether to fix the problem in what you thought was the 'read-only' data warehouse or fix the transaction processing system.

You will need to validate data not being validated by transaction processing systems. Typically once data are in warehouse many inconsistencies are found with fields containing 'descriptive' information.

Implementation of a DW means building a HIGH-maintenance system. Reorganizations, product introductions, new pricing schemes, new customers, changes in production systems, etc. are going to affect the warehouse. If the warehouse is going to stay 'current' (and being current will be a big selling point of the warehouse), changes to the warehouse have to be made fast.

Security issues: With typical OLTP project, database administration and management can easily establish who must have access and who does not. With data warehouses (OLAP) and mining practices, especially if the technology is being implemented for Web use, poses risks in security. The more accessible you make your data warehouse (by 'accessible', what is meant is architecting it in a way that people want to use it), the greater security risk you are exposing yourself too. Restricting people to "need to know" is not a good option in the organization on the 2000sHowever, exposing information to theft from anyplace in the globe is not too great for job security either.

Warehouse users will develop conflicting business rules
Many warehouse tools allow users to perform calculations. The tools will allow users to perform the same calculation differently. For example, if you are summarizing candy bar sales by flavor category, and that the flavor category includes macadamia-chocolate and white-chocolate. If you have a macadamia-chocolate brand there is a chance that two users will classify the brand in different categories. You will find that there are means to incorporate some of the business rules in your warehouse. However, the number of possible business rules is so large that you will not be able to incorporate all rules.

Data warehousing can have a learning curve that may be too big (long) for impatient organizations.
Despite the speed of the data warehousing development effort, it takes time for an organization to figure how it can change its business practices to get a substantial return on its data warehousing investment. This is quite different than database projects in which the requirements phase of the project has well established parameters.

DW's may require the purchase of special OLAP tools (See http://www.olapreport.com/How_not_to_buy.htm)

Similarities between DB's and DW's:

- Some of the "core" concepts are similar. That is, schema architecture, query language theory, syntax and fundamentals, etc. are similar between traditional databases and DWs.
- The code tools may also be similar. For example, a traditional RDBMS may use Oracle; a DW may be a heterogeneous conglomeration of several Oracle database.
- Funding and project support for both DW's and DB's usually come from the same source (such as top management)

What makes a data warehouse project unique and/or similar to other IT/IS projects?

I think DW projects are similar to other IT projects if they are created using the traditional software engineering approach. That is, using a lifecycle process model which outlines the phases of the project: Requirements, Design, Testing, Implementation, Updating/Maintenance. I have never worked on a DW project so I don't have first-hand experience with how exactly this is done. Differences between typical IT projects and DW projects would be similar to some of those listed above. Namely, the size, cost and scope, and build-time of the project being greater than your run-of-the-mill software engineering project.

Links:
http://www.dbaint.com/InfoDBDW.html
http://www.dci.com/events/managedw/
http://www.dwinfocenter.org/gotchas.html
http://www.dwinfocenter.org/against.html

More Data warehousing and data mining information:

OLAP vs. OLTP

Multiple Dimensional View of Database: ROLAP, MOLAP, HOLAP

Data Warehouse Project Warnings

Data Mining Primitives, Hierarchies, Architecture and Coupling

Data Preprocessing for Data Warehouses

Dimensions of data quality, tuples with missing values, data smoothing and data integration

Data Characterization, Discrimination, Association, Classification, Prediction, Clustering, and Evolution Analysis: Differences and Similarities

Data Warehouse Project vs Any Other Large Database Implementation

Data Mining and Data Warehousing in Biology, Medicine and Health Care

Other Information Technology pages:

Project Management Software

Project Management Training — FAQ part 1

FAQ part 4: Cost-Time Graph and Shortening Critical Path


Back to Info-Source home page