Abstract – (Data WareHouse) The aim of this paper is to show the importance of using data warehousing and data mining nowadays. It also aims to show the process of data mining and how it can help decision makers to make better decisions. The most important findings are the phases of data mining processes, which are highlighted by the developed model, and the importance of data warehousing and data mining. It can help to get better answers which allow both technical and nontechnical users to make much better decisions. Practically, data warehousing and data mining is really useful for any organization which has huge amount of data. Data warehousing and data mining help regular (operational) databases to perform faster. They also help to save millions of dollars and increase the profit, because of the correct decisions made with the help of data mining. This paper shows the process of data mining and how it can be used by any business to help the users to get better answers from huge amount of data. It shows an alternative way of querying data.
Have you ever thought about the recommendations you get when you shop online. If you purchase for example a TV online, the website recommends you another product that you really need to get. Also have you ever thought about the alerts you get from your bank when you do a sudden use of your credit card in a different city. Actually, these are examples of data mining which is the process of discovering useful patterns in huge data set. This huge data is created by integrating current and historical data from different sources and store them centrally in a special repository called Data Warehousing(DW).
DW is very important for the historical data and transactions. For example, the old data about the purchase transactions made by customers at modern supermarkets. Keeping this kind of data in a regular database will make it very huge and then slower performance. For those reasons the#historical data and transaction details should be archived in a data warehouse for data mining purposes.
Unlike the modeling techniques used to design regular databases, data warehousing is designed by using dimensional modeling techniques. Data warehousing modeling is complex. It needs: 1) knowledge of the business processes, 2) Understanding the structural and behavioral system’s conceptual model, and 3) being familiar with data warehousing techniques.
A BUSINESS ANALYSIS FRAMEWORK FOR DATA WAREHOUSE DESIGN
A data warehouse#can enhance business#productivity because it is able to quickly and efficient#gather information that#accurately describes the#organization. a data warehouse facilitates#customer#relationship management because it provides a consistent#view of customers and#items across all lines#of business, all departments, and all markets. Finally, a data warehouse may#bring about cost reduction by tracking trends, patterns#and exceptions over long#periods.
The different views of data warehousing are:
- The top – Down view allows the selection of the relevant information necessary for the data warehouse. This information matches current and future business needs.
- The Data source view exposes the#information being captured, stored, and managed by#operational system. This information may be documented at#various levels of detail and accuracy, from individual#data source tables to integrate at various levels of detail and#accuracy, form individual data source tables to integrated data#source tables.
- The Data warehouse view includes fact tables and#dimension tables. It represents#the information that is stored#inside the data ware house, including recalculated totals and#counts, as well as information regarding#the source, date and time of origin added to#provide historical context.
- The Business Query View is the data perspective#in the data warehouse form the end-user’s view Point.
So, building and using a data#warehouse is a complex task#because it requires business skill technology skills, and program management skills. Regarding#business skills, building a data#warehouse involves understanding#how systems store and manage#their data, how to build#extractors that transfer data from#the operational system#to the data ware house, and how to#build warehouse refresh#software that keeps the data#warehouse reasonably up-to-date with the#operational system’s data.
DATA WAREHOUSE DESIGN PROCESS
Here we discussed about various approaches to the data#warehouse design process and the steps#involved. A data warehouse can be built using a top-down approach, a bottom-up approach or a combination of both. The top – down approach starts with overall design and planning. It is useful in cases where the technology is mature and well known, and where the business problems that must be solved are clear and well understood. The bottom -up approach starts with experiments and prototypes. This is useful in the early stage of business modeling and technology development. And it also allowed an organization to move forward at considerable less expenses and evaluate the technological advantages before making significant commitments. In the combined approach, an organization can be exploit the planned and strategic nature of the top-down approach while retaining the rapid implementation and opportunistic application of the bottom – up approach. If we are thinking in from the software engineering point of view, the design and construction of a data analysis, warehouse design , data integration and testing, and finally deployment of the data warehouse. Large software systems can be developed by using one of the two technologies. The Waterfall method and the spiral method.
The various steps involved in data warehouse design is:
- Choose a Business Process to Model if the business process#is organizational and involves multiple complex object collections, a data warehouse model should be followed. However, if the process is departmental and focuses on the analysis of one kind of business process, a data mart model should be chosen.
- Choose the business process gain, which is the fundamental, atomic level of data to be represented in the fact table for this process.
- Choose the dimension that will apply to each and every fact table record. Typical dimensions are time, item, customer, supplier, warehouse, transactions type, and status.
- Choose the measures that will populate each fact table record. Typical measures are numeric additive quantities like dollars_sold and units_sold.
Because the process of construction of data warehouse is a quite difficult and long-term task, its implementation scope should be clearly defined. The goals of a fundamental data warehouse implementation should be specific, achievable and measurable. This involves determining the time and budget allocations, the subset of the organization that is to be served. So, once a data warehouse is designed and constructed, the fundamental deployment of the warehouse includes the initial installations, roll – out planning, training, and orientations. And platform upgrades and maintenance must also be considered. So, the data warehouse administration includes data refreshment, data source synchronization, planning for disaster recovery, managing access control and security, managing data growth, managing data base performances and of course data warehouse enhancement and extension.
DATA WAREHOUSE USAGE FOR INFORMATION PROCESSING
The proposed Meta model of data warehouse operational processes is capable of modeling complex activities, their interrelationships, and the relationship of activities with data sources and execution details. Moreover, the Meta model complements the existing architecture and quality models in a coherent fashion, resulting in a full framework for quality-oriented data warehouse management, capable of supporting the design, administration and especially evolution of a data warehouse. Data warehouse and data marts are used in a wide range of applications. There are three kinds of data warehousing applications: Information processing, Analytical processing, and data mining.
- Information Processing supports querying, basic statistical querying, basic statistical analysis, and reporting using cross tabs, tables, charts or graphs. A current trend in data warehouse information processing is to construct low-cost web-based accessing tools that are then integrated with web browsers.
- Analytical Processing supports basic OLAP operations, including slice-and-dice, drill-down, roll-up, and It generally operates on historic data in both summarized and detailed forms. The major strength of online analytical processing over information processing is the multidimensional data analysis of data warehouse data.
- Data Mining supports knowledge discovery by finding hidden pattern and association constructing analytical models, performing classification and prediction, and presenting the mining results using visualizations tools.
Nowadays we have enormous volume of data which lead to the necessity of using data warehousing and data mining. Data warehouse is used as a central store of a subject oriented, integrated, time-variant and non-volatile collection of data from different sources (operational databases). For faster performance, data warehousing organizes data in a different architecture – fact table and dimension tables.
In the area of#integrating multiple, distributed, heterogeneous#information sources, data warehousing is a viable and, in some cases, superior alternative to traditional research solutions. Traditional approaches request, process, and merge information from sources when queries are posed. In the data warehousing approach, information is requested, processed, and merged continuously, so the information is readily available for direct querying and analysis at the warehouse. Although the concept of data warehousing already is prominent in the database industry, but there are a number of important open research problems, that need to be solved to realize the flexible, powerful, and efficient data warehousing systems of the future.