Introduction and definition of data warehouse
Data warehousing is a technological database that is relationally designed for the query, as well as analysis function. It incorporates data stores and logical, conceptual, as well as physical models that support business objectives and end-user information needs. Ideally, it also comprises of historical data that is derived from myriad sources including transactional data. One of the most advantageous primary functions of this warehouse is separation of analysis of workload from transactions. This enables any organization in question to consolidate the data acquired from different sources (Inmon 63). The creation of a data warehouse necessitates mapping data between targets and sources. After that, it captures details of the alteration in a metadata-repository. Data warehousing is an element of data analysis that is used in small, medium and large data collections. Examples of organizations that use this system are usually able to retrieve and analyze historical data in any elementary system.
According to Bill Inman, data warehouses are integrated, subject oriented, non-violent collections, and time- variant in its purpose of management decisions making progressions. Ideally, these elements are briefly expressed by the following conditions:
- Integration- form the previous property, there are myriad data categories, which are subject to analysis, as well as processing. In this property, data warehouse is necessitated to integrate data from multiple sources of informational data (Inmon 62). For instance, informational source A and informational source B might have different strategies of identifying products. However, when it comes to the data warehouse, all the data that is collected is identified in a single form. This means that they are integrated within the data warehouse.
- Time variant property in definition- data is time variant. That is; data is recorded with reference to different dates of occurrence. In that case, it is recommended to be recorded with reference those accrued dates to ensure the safe accountability. For instance, data can be retrieved at intervals of 3 months, 6 months, and 12 months or even larger annual intervals, from warehouses. This means that data within the data warehouse can be identified and retrieved even after many years since it was posted. This is different from the transactional data systems that manage to keep short-term data (Inmon 63). That is; only the recent data is kept in transactional data systems. For instance, bank customer-records are usually holding the most recent data incorporating the addresses and the details.
- Non-volatility- the property of security is very much considered in the formulation of a data warehouse. Ideally, when data is situated in a warehouse, it is not subject to change. This means that information to the latest or long history is not subject to alteration. According to Ralph Kimball, data from the warehouse is regarded as the safest data that the management of any organization can utilize in the decision making. He focuses on the function ability of a warehouse.
How work is carried out in data warehouses
Warehouses have strategically managed operations of which good results are gathered. There are definitions and end users of the data warehouses. In that case, must strategic formulation and process definition to ensure that the warehouses managed by any organization are appreciated and utilized (Laurent 45). First, there is the need to identify a simple conceptual framework that defines the way activities involved in warehouses are defined.
The diagram above shows the features that can enable one to comprehend the basic operation of the data before a warehouse is formally formulated. There are some elements that define the concept of operation that is found in a warehouse. In a systematic format, they include:
How they deal with the work load
First, the data warehouses are designed to accommodate ad hoc queries. That is; it is formulated with averment features that enable the retrieval of data to be effective. After information is entered from the various oriented subject areas, it is then segregated and identified as one element. This means that it is very tricky when identifying the information within the warehouse beforehand. This means that a data warehouse should be optimized to perform effectively and efficiently when dealing with the expected large variety of probable query-operations (Laurent 47). This is one of the differentiated operational characteristics from related systems like the OLTP whose operations support predefined operations. This means that the specifications need to be specifically designed to support additional operations.
Data modifications strategy
It is identified that data warehouses have a strongly adopted systems in data storage. In connection to that, data warehouses are regularly updated by the progressive process named ETL. Ideally, this process of updating is carried out on weekly and on a daily basis (Laurent 47). This is done by the use of data-modification technique. However, this is done by the personnel responsible for technical updates. This means that the end user is just responsible for accessing the information. This is different from related systems that allow the intervention of the end users in updating of the information.
Warehouses make perfect utilization of de-normalized schema. Examples of these de-normalized schemas that data warehouses uses are the star schemas. Ideally, the schemas are essential in improving the functionality of the data warehouses to ensure that performance by the system is at an optimal level. This makes it different from other data systems that use normalized schemas that yield poor performance in the long-run.
Typicality and history of data operations
Initially, the enabled features that have discussed in the previous working edges of data warehouses signify the possibility of system to support multiple applications. Ideally, statistics implies that and end user can access millions of rows simultaneously. An example of a subject that can possess data in a warehouse is the sales (Laurent 49). One can access sales for all customers within one month without performance failure. It is essential to note that this data subject to storage and access can be of many years. This capability can enable the management of any organization to access information for myriad years and make relevant analysis from it. The diagram below shows the complete system of the warehouse performance framework
Major components of this data warehouses
The diagram below shows the complete system of the warehouse performance framework
Data warehouses are based on relational database management system server. It operates as repository centre for data that is information. There are some major components of the data warehouses that are typically responsible for the enactment of data analysis (Berson 45). Initially, the components of the data warehouse arise from the necessities of its functionality. They revolve around objectives such as:
- Making the management environment functional
- Making the data manageable
- Making a clearer access to information
- Making the end user query process applicable
Operational applications are essential in this process. Ideally, the major components of data warehousing include:
- Data warehouse database
- Sourcing, acquisition, clean-up and transformation oriented tools
- The meta data
- Access tools
- Administration and management section of the data warehouse
- Information delivery system
Data warehouse database
This is the cornerstone of the processes carried out in the data warehouse. It is a platform in which the management system implements its operations (Berson 39). Specialists who work on this database make approaches like the parallel relational database which is essential for integrating symmetric multiprocessors and the data disks. In addition, the system processors are speeded up hence making the process easy to work.
This is data that describes data. It is classified into two. That is; the technical and the business Meta data. Ideally, they are equivalently essential in the provision of internet for access data. That means that Meta data is essential in data extraction.
Users of data from the data warehouse find it easy to interact with the system in retrieving of information. The tools of executing data warehouse checkups are essential for queries making and reporting to the management. This component is also very important in decision making within an organization.
Administration and management section of the data warehouse
Ideally, the size of the data warehouse is incomparable to the other related data databases. This means that the operational progress is very encouraging comparing to the other databases. The management of the warehouses is also easy.
Information delivery system
This component is essential in making the process of subscription for data easy. It also ensures that the data is transmitted to the relevant destinations. This will enable the warehouse to distribute the information to the end-users.
Vendors in the market for data warehouses
In the near past, one of the most dominant vendors of data warehouse is the Teradata organization. It has been dominating in the sector for it has been carrying the largest data workloads globally. However, current research has determined that Apple, eBay, and Wal-Mart have acquired the largest data warehouses. Ideally, Apple gained this status in the year 2011. It makes use of its data warehouse to acquire information of all its customers across the product groups. Wal-Mart gained its momentum in the year 2008 with a system of approximately 2.5 pet bytes (Laurent 50). Conclusively, the eBay has formulated two systems. They are both large. They compose of 9.2 pet bytes of data, with a singular system which stores data for its support process. These data systems have divergent maintenance systems to enable end-users access data efficiently.
The most challenging perspective that affects the development, maintenance and operational processes of the data warehouses is the money required for its operations. Economically, the capital finance, as well as the operational capital that is used to carry out data management is very high. To some organizations, they prefer using cheaper databases that will cost them less. Most probably, cost is rated with reference to the amount of data that is transmitted with reference to the mouse clicks made globally.
Pros and cons of data warehouses
Some of the pros of the data warehouse include the factor that the demanded quality of service by the data is satisfactory to the clients. In addition, most of these appliances manage data of 1 to 10 TB. Presence of good data offers acts as a strong-hold point for the data-warehouse managers. Ideally, this means that it is feasible that the data vendors can spread their information globally efficiently.
The operational requirements of maintaining data warehouses are restricted within persons who have operated on the database. This may enable them to acquire data modeling skills and analytic skills. This means that employment can only be effective for those who have high profile academic qualifications. On the other side, the in-staff training can be used to promote the skill of these employees. However, this cannot justify the fact that they will deliver quality services. In addition, the probability of retaining employees is very low.
Provisional tutorial on data warehouses
A successful tutorial of the data warehouses requires that one possesses the required information about the data warehouses. Part of the information that should be important includes incorporation of diagrammatic illustrations and examples that will enable the learner appreciate the topic. For instance, when discussing the components of data warehouses, one requires that well structured diagrams and details be included in the tutorial teaching aids. Use of projected information of examples of data warehouse architectural form is very essential in the process of teaching the topic of data warehouses. Ideally, since the topic is very technical, reference sources are very essential in the tutorial. Proper arrangement of ideas in a chronological order is essential for successful passage of information.
Relevant target of data warehouse information
Ideally, some of the people who are deemed to be the relevant recipients of information related to data warehouse include management professional as well as technology experts. For management professional, this information is useful in the perspective of their areas of specialization. That is; they need a reliable source of information for them to execute their tasks. The ability of the data-warehouse to store useful information for different sectors within the firm’s organization structure is one of the reasons why any management professional should embrace this strategy. As far as informational analysis is very critical in the management decisions of any organization, then the literacy level on the need for data warehouses is necessitated (Laurent 78). Looking at the impact that the information does to the technological world of operation, information technology specialist should also be made aware of this area of specialization. The business world is moving global, and there is an arising requirement for literate information technology specialist who can move at the same pace with the rest of the world. For instance, information technology oriented persons are required in the development of this system. They are important in maintaining a clear contact between the end-user clients and the mother companies.
Berson, Alex, and Stephen J. Smith. "Components of a Data Warehouse." data management. N.p., 2014. Web. 14 Mar. 2014.
"Data Warehousing Concepts." 1Keydata - Free Online Programming Tutorials. N.p., n.d. Web. 4 Mar. 2014.
"Data Warehousing Concepts." Oracle Documentation. N.p., n.d. Web. 4 Mar. 2014.
Inmon, William, and Krishnan,Krish. Building the Unstructured Data Warehouse: Architecture, Analysis, and Design. Westfield: Technics Publications, LLC, 2011. Print.
Laurent, Anne, and Marie-Jeanne Lesot. Scalable Fuzzy Algorithms for Data Management and Analysis: Methods and Design. Hershey, Pa: Information Science Reference, 2010. Print.
Sauter, Vicki L. Decision Support Systems for Business Intelligence. Hoboken: John Wiley & Sons, 2011. Print.