Wednesday 4 January 2012

Contrasting OLTP and Data Warehousing Environments


Contrasting OLTP and Data Warehousing Environments:






One Major difference between the types of system is that data warehouses are not usually in third normal form (3NF), Which is commonly used in OLTP environments.

Data warehouses and OLTP systems have very different requirements. Here are some examples of differences between typical data warehouse and OLTP systems:
(i)                   Work Load:
Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations.

OLTP systems support only predefined operations. Your applications might be specially tuned or designed to support only these operations.

(ii)                 Data Modifications:
A data warehouse is updated on a regular basis by ETL process (run nightly or weekly) using bulk data modifications techniques. The end users of a data warehouse don't directly update the data warehouse.

In OLTP systems, end users routinely issue individual data modifications statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.

(iii)                Schema Design:

Data warehouses often use demoralized or partially demoralized schemas (such as star schema) to optimize query performance.

OLTP systems often use fully normalized schemas to optimize update/ insert/ delete performance, and to guarantee data consistency.

(iv)               Typical Operations:

A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month ".

A typical data warehouse operation accesses only a handful of records. For example, "Retrieve the current order for this customer ".

(v)                 Historical Data:

Data warehouses usually store many months or years of data. This is to support historical analysis.

OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.

Data Warehouse Characteristics


Data Warehouse Characteristics:

Ø  Subject Oriented
Ø  Integrated
Ø  Non-Volatile
Ø  Time Variant

Subject Oriented:
                Dare warehouses are designed to analyze the data. For example, To learn more about company's Sales data, we can build a warehouse which concentrates on sales. Using this warehouse, we can answer the questions like
·         Who was our best customer ?
·         Which regions there is Max Sales ?
·         Profits and Losses wrt Previous Year Sales ?
Etc..

                This ability to define a data warehouse by subject (Sales) makes Data Warehouse Subject Oriented.




Integrated:
                Integration is slowly related to Subject oriented. Data Warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistency among units of measure. When they achieve this, they are said to be integrated.




Non Volatile:
                Non Volatile means that once entered into the warehouse, data should not be changed. This is logical because the purpose of warehouse is to enable you to analyze what has occurred.


Time Variant:
                In Order to discover trends in business, analysis need large amounts of data. This is very much in contrast to online transaction processing system (OLTP) where performance requirements demand that historical data can be moved to an archive.
                A Data Warehouse focus on change over time is what meant by the term time variant.



Why to implement Data Warehousing?


Why to implement Data Warehousing?

1.       To perform server/ disk bound tasks associated with query and reporting on server / disks not used by transaction processing system.

It means,
Every organization has different databases for Transactions, Query and Reporting purpose.
This is because, if they generate reports on the Transactional Database, it will be varying for every hour.
And if transaction Database stores huge amount of data, then perform select will consume more time and also it will internally affect transaction process.
Hence they can't exactly analyze the up's and Down's of their Business.
       
But if they perform the Query and Generate reports on static data which varies less frequently, they can analyze their business in better way.

This is the main reason for Evaluation of Data Warehousing.

2.       To use data models and/ or server technologies that speed up query processing and reporting which are not appropriate for transaction processing.

There are several techniques to speed up the query process like Star Schema, Using Clusters, Bit map indexing etc.. But All of them effect the performance of DML operations on the data. Hence if they are implemented on the transactional database, entire database process will be collapsed.
 There a few more server technologies which improves transaction processing but reduces the query and reporting process.  Hence the Organizations or firms started using Data Warehouse for separating the business logic from data to data transaction process.

3.       To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports and/ or to provide a means to speed up the writing and maintaining of queries and reports by technical personnel.

4.       To provide a repository of "Cleaned Up" transaction processing system data that can be reported against and that doesn't necessarily require fixing the transaction processing systems.


5.       To make it easier, on a regular basis to query and report data from multiple transaction processing systems and/ or from external data that must be stored for Query / Report purposes only.

6.       To provide a repository of transaction processing system data that consists data from a longer span of time than can efficiently be held in a transaction processing system and/ or to be able to generate reports " As Was" as of a previous point in time.

7.       To prevent persons who only need to query and report transaction processing system data from having any access whatsoever to transaction processing system databases and logic used to maintain those databases.

Example:
In order to store data over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements encoding structures, and physical attributes of data.

Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems.





In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source system to another. This inconsistency in date can be avoided by integrating the data into a data warehouse with good standards.





In the above example of target data, attribute name, column name, and datatypes are consistent throughout the target system. This is how data from various is integrated and accurately stored into the data warehouse.











What is Data Warehousing ?


What is Data Warehousing?

-          Data Warehousing is an OLAP database which stores current data and historical data. It is a centralized proposed system and a decision making system which is used by the organization to analyze their business.

-          According to Inmon,
" A Data Warehousing is a subject oriented, Integrated, Time Variant, Non Volatile collection of data in support of management's decision making process ".

-          A Data Warehousing is a relational database which is designed for Query and Analysis rather than transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.
It separates Analysis workload from transaction workload and Organization to consolidate the data from several sources.

-          A Data Warehouse is a copy of transaction data specifically structured for querying and reporting.