Data warehouse architecture varies from organization to organization as per their specific needs. Some may have ODS( Operational Data Source) as a source of data, whereas some may have data mart as a source of data for a data warehouse.
In general, all data warehouse systems have below component/layers:-
- Data Source Layer.
- Data Landing Layer.
- Staging Area.
- ETL Layer.
- Data Storage Layer.
- Metadata Layer.
- Data Presentation Layer.
Each component is discussed in details below:-
Data Source Layer:-
This layer is responsible for feeding data into warehouse. There can be verities of data source for a single data warehouse. For example, source can be operational data source(ODS), any relational database, flat files, excel file, csv files or any other kind of database.
Different kinds of data can be produced by different data source system-
- Operational – such as transactional data, customer data, inventory data, product data etc.
- Browsing data – can be different kind of browsing data from websites.
- Log data – Any kind of log produced by web server, software etc.
Data Landing Layer:-
This layer represents the area where data land from the data source before loading to staging area. Data can be stored here using push or pull mechanism that means source can push data to the landing area or pull mechanism can be there which will extract the data from the data source. Minimal data cleansing can be performed in this layer.
Staging Area:-
This is the entry point into a data warehouse. Data sits here prior to loading data into actual core area of data warehouse. Usually no transformation of data happens till this layer. Using this area, it becomes easier to load data into actual warehouse.
ETL Layer:-
This is where data get transformed as per data warehouse terms. Data may come from different source and there may be different terms to identifying same product. But prior to loading those information into warehouse, transformation is needed to give name/tag to those product as per warehouse own language.
This layer is also where data cleansing happens. The ETL design is often the most time consuming process in the data warehouse project and ETL tools are used to accomplish this tasks.
Data Storage Layer:-
This is where the transformed and cleaned data sit. This is the core section of data warehouse and maintain history of data as per business requirement.
Presentation Layer:-
Data is stored here after extracting data from the data storage layer and applying business logic. So that business user can directly access data as per their requirement from this layer without exacting data from the core layer after applying complex logic.
Data from this layer is used for end user as well via different reporting tool.
Metadata Layer:-
This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that’s in the metadata layer. A metadata tool is often used to manage metadata.