The conceptual Entity-Relationship (ER) is extensively used for database design in relational database environment, which emphasized on day-today operations. Multidimensional (MD) data modeling, on the other hand, is crucial in data warehouse design, which targeted for managerial decision support. It supports decision making by allowing users to drill-down for a more detailed information, roll-up to view summarized information, slice and dice a dimension for a selection of a specific item of interest and pivot to re-orientate the view of MD data. When designing a MD model regardless whether it is a star or snowflake schema, it involves the identification of a fact, dimensions and measure attributes. This paper will explore on how the Multidimensional model can be used as the yardstick of data warehouse design instead of ER Model.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
DATA MODELING TECHNIQUES FOR DATA WAREHOUSE
MS. ALPA R. PATEL*; PRO F. (DR.) JAYESH M. PATEL**
*R. K. Achchhariwala College of Computer & Applied Sciences,
Vapi, Dist.-Valsad (South Gujarat),
Veer Narmad South Gujarat University, Surat.
**Gujarat Technological University,
Sankalchand Patel College of Engineering,
Sankalchand Patel Sahakar Vidhyadham,
Visnagar, Dist. Mehasana, North Gujarat.
ABSTRACT
The conceptual Entity-Relationship (ER) is extensively used for database design in relational
database environment, which emphasized on day-to-day operations. Multidimensional (MD)
data modeling, on the other hand, is crucial in data warehouse design, which targeted for
managerial decision support. It supports decision making by allowing users to drill-down for
a more detailed information, roll-up to view summarized information, slice and dice a
dimension for a selection of a specific item of interest and pivot to re-orientate the view of
MD data. When designing a MD model regardless whether it is a star or snowflake schema, it
involves the identification of a fact, dimensions and measure attributes.
This paper will explore on how the Multidimensional model can be used as the yardstick of
data warehouse design instead of ER Model.
KEYWORDS: Entity-Relationship Model, Multidimensional Model, Fact, Dimensions,
Attributes.
___________________________________________________________________________
INTRODUCTION
1 DATA MODELING TECHNIQUES
Two data modeling techniques that are relevant in a data warehousing environment are ER
modeling and Multidimensional modeling.
ER modeling produces a data model of the specific area of interest, using two basic concepts:
entities and the relationships between those entities. The ER model is an abstraction tool
because it can be used to understand and simplify the ambiguous data relationships in the
business world and complex systems environments.
Multidimensional modeling uses three basic concepts: measures, facts, and dimensions.
Multidimensional modeling is powerful in representing the requirements of the business user
in the context of database tables.
Both ER and Multidimensional modeling can be used to create an abstract model of aspecific
subject.
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
1.1 ER MODELING
An ER model is represented by an ER diagram, which uses three basic graphic
symbols to conceptualize the data: entity, relationship, and attribute.
1.1.1 ENTITY
An entity is defined to be a person, place, thing, or event of interest to the business or
the organization. An entity represents a class of objects, which are things in the real world
that can be observed and classified by their properties and characteristics.
1.1.2 RELATIONSHIP
A relationship is represented with lines drawn between entities. It depicts the structural
interaction and association among the entities in a model. A relationship is designated
grammatically by a verb, such as owns, belongs, and has. The relationship between two
entities can be defined in terms of the cardinality. This is the maximum number of instances
of one entity that are related to a single instance in another table and vice versa. The possible
cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M).
1.1.3 ATTRIBUTES
Attributes describe the characteristics of properties of the entities. For clarification, attribute
naming conventions are very important. An attribute name should be unique in an entity and
should be self-explanatory.
When an instance has no value for an attribute, the minimum cardinality of the attribute is
zero, which means either nullable or optional.
In ER modeling, if the maximum cardinality of an attribute is more than 1, the modeler will
try to normalize the entity and finally elevate the attribute to another entity. Therefore,
normally the maximum cardinality of an attribute is 1.
1.2 MULTIDIMENSIONAL MODELING
In some respects, Multidimensional modeling is simpler, more expressive, and easier to
understand than ER modeling. But, Multidimensional modeling is a relatively new concept
and not firmly defined yet in details, especially when compared to ER modeling techniques.
1.2.1 BASIC CONCEPTS
Multidimensional modeling is a technique for conceptualizing and visualizing data models as
a set of measures that are described by common aspects of the business. It is especially useful
for summarizing and rearranging the data and presenting views of the data to support data
analysis. Multidimensional modeling focuses on numeric data, such as values, counts,
weights, balances, and occurrences
Multidimensional modeling has several basic concepts:
Facts
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
Dimensions
Measures (variables)
1.2.2 FACT
A fact is a collection of related data items, consisting of measures and context data. Each fact
typically represents a business item, a business transaction, or an event that can be used in
analyzing the business or business processes.
In a data warehouse, facts are implemented in the core tables in which all of the numeric data
is stored.
1.2.3 DIMENSION
A dimension is a collection of members or units of the same type of views. In a diagram, a
dimension is usually represented by an axis. In a Multidimensional model, every data point in
the fact table is associated with one and only one member from each of the multiple
dimensions. That is, dimensions determine the contextual background for the facts. Many
analytical processes are used to quantify the impact of dimensions on the facts.
Dimensions are the parameters over which we want to perform Online Analytical Processing
(OLAP). For example, in a database for analyzing all sales of products, common dimensions
could be:
Time
Location/region
Customers
Salesperson
Scenarios such as actual, budgeted, or estimated numbers
DIMENSION MEMBERS: A dimension contains many dimension members. A dimension
member is a distinct name or identifier used to determine a data i
all months, quarters, and years make up a time dimension, and all cities, regions, and
countries make up a geography dimension.
DIMENSION HIERARCHIES: We can arrange the members of a dimension into one or
more hierarchies. Each hierarchy can also have multiple hierarchy levels. Every member of a
dimension does not locate on one hierarchy structure. A good example to consider is the time
dimension hierarchy as shown in Figure 1.
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
FIGURE 1: MULTIPLE HIERARCHIES IN A TIME DIMENSION
1.2.4 MEASURE
A measure is a numeric attribute of a fact, representing the performance or behavior of the
business relative to the dimensions. The actual numbers are called as variables. For example,
measures are the sales in money, the sales volume, the quantity supplied, the supply cost, the
transaction amount, and so forth. A measure is determined by combinations of the members
of the dimensions and is located on facts.
Considering Relational context, there are two basic models that are used in dimensional
modeling: (i) star model and (ii) snowflake model.
The star model is the basic structure for a dimensional model. It has one large central table
(fact table) and a set of smaller tables (dimensions) arranged in a radial pattern around the
central table. (We show an example in Figure 2). The snowflake model is the result of
decomposing one or more of the dimensions. The many-to-one relationships among sets of
attributes of a dimension can separate new dimension tables, forming a hierarchy. (Figure 3
shows an example). The decomposed snowflake structure visualizes the hierarchical structure
of dimensions very well.
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
FIGURE 2: STAR MODEL
FIGURE 3: SNOW FLAKE MODEL
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
3 MULTIDIMENSIONAL MODEL Vs RELATIONSHIP MODEL
ER is a logical design technique that seeks to remove the redundancy in data. This
coupled with normalization of data enables easy maintainability and improves data integrity
which is a necessity for transaction processing applications. End user comprehension and the
data retrieval are major show stoppers; as such a database is proliferated with dozens of
tables that are linked together by a bewildering spider web of joins. Use of the ER modeling
technique defeats the basic allure of data warehousing, namely intuitive and high-
performance retrieval of data.
MD is a logical design technique that seeks to present the data in a standard, intuitive
framework that allows for high-performance access. Every Multidimensional model is
composed of one table with a multipart key, called the fact table, and a set of smaller tables
called dimension tables. Each dimension table has a single-part primary key that corresponds
exactly to one of the components of the multipart key in the fact table. This characteristic
"star-like" structure is often called a star join.
Each dimensional table is logical and user identifiable and serves a business purpose
by serving as an object of interest to the user. It is also maintained by the ETL process of the
data ware housing application .Hence it is considered as an internal Logical file and included
in the data function count.
4 WHY ER IS NOT SUITABLE FOR DATA WAREHOUSES?
End user cannot understand or remember an ER Model. End User cannot navigate an ER
Model. There is no graphical user interface or GUI that takes a general ER diagram and
makes it usable by end users.
ER modeling is not optimized for complex, ad-hoc queries. They are optimized for
repetitive narrow queries
Use of ER modeling technique defeats this basic allure of data warehousing, namely
intuitive and high performance retrieval of data because it leads to highly normalized
relational tables.
5 CONCLUSION
In this paper, we examine that an E-R structured data warehouse, absent associative entities,
i.e. fact tables, is a not viable concept given recent developments in data warehousing.
A number of conclusions are supported by the arguments.
Not every E-R model can be represented as a set of star schemas containing
equivalent information
But every properly constructed E-R data warehousing model can be so represented
Many E-R data warehouse models are not properly constructed in that they don't
explicitly recognize many-many relations and the need to resolve them with
associative entities, i.e. fact tables.
ZENITH
International Journal of Multidisciplinary Research
Vol.2 Issue 2, February 2012, ISSN 2231 5780
To use data warehousing E-R models specifying atomic data dependency
relationships without fact tables is to ensure poor query response performance in large
databases, and therefore discourage, and often prevent, execution of a multi-stage
analysis process. In effect, it is to make the data warehouse no more than a big staging
area for data marts, with no independent analytical function of its own.
Given the development of ODSs and non-queryable centralized staging areas for
storing, extracted, cleansed, and transformed data and for gathering centralized
metadata.
We don't need another non-queryable staging area called a data warehouse. What we
do need, instead, is a dimensionally modeled data warehouse for enterprisewide DSS,
prepared to provide the best in query response performance and to support the most
advanced OLAP functionality we can devise.
REFERENCES
1) A Conceptual Model for Multidimensional Data By Anand S. Kamble, Department of
Information Technology, Government of India, New Delhi, India
2) An Overview of Data Warehouse Design Approaches and Techniques, Alejandro
Gutiérrez, Adriana Marotta Instituto de Computación, Facultad de Ingenieria,
Universidad de la República, Montevideo, Uruguay October 2000
3) A Data Model for Supporting On-Line Analytical
Fifth International Conference on Information and Knowledge
Management, 1996, pp. 81 88.
4) Conceptual Multidimensional Model, By Manpreet Singh, Parvinder Singh, and
Suman, World Academy of Science, Engineering and Technology 36 2007
5) Data Modeling Techniques for Data Warehousing By Chuck Ballard, Dirk Herreman,
Don Schau, Rhonda Bell, Eunsaeng Kim, Ann Valencic
6) Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M.
Firestone, Ph.D. White Paper No. Eight, June 22, 1998
7) Dimensional Model Data Warehouse: An Overview,
Insurance Company, Novato, CA
8) Estimation Model for Data Warehousing Projects Presented In 2nd Annual
International Estimation Colloquium 2007,By Karthikeyan Sankaran & Sujatha
Sivaraman ,Hexaware Technologies
9) Kimball, Ralph, The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling, New York, NY: John Wiley and Sons, Inc., 2002. 436pp.
10) Len Silverston, W. H. Inmon, and Kent Graziano, The Data Model Resource Book
(New York, NY: John Wiley & Sons, Inc., 1997).
... Database management systems have become ubiquitous and are fundamental tools in the management of information. Therefore, data modeling is an integral part of database management systems [1,8]. ...
... The data model can be viewed as concepts that describe the structure of a databasestructure of how data is held. Database management systems can be based on a relational, non-relational, network, hierarchical data models amongst others [1], [8]. ...
The deployment and maintenance of enterprise networks form the bedrock of any organization, be it government, commercial, academic, and/or non-profit making. These networks host vast amounts of information, databases, in either temporary mode while in transit or permanent mode while stationary. The databases are managed by the information systems with appropriate functions that meet consumers' needs. Databases hold varying data – structured, semi-structured, or unstructured. Data is increasingly becoming a vital organizational asset and therefore plays a crucial role in making organizational decisions. With growth in the internet, digital data sources have become ubiquitous. In turn, this has seen the continued growth in the volume, variety, veracity, velocity, and value of data. Big data brings with its data complexities that have an eventual impact on the data modeling techniques. This paper presents a review of big data modeling techniques with a concentration of enterprise networks. We started by appreciating big data before embarking on modeling techniques for big data. Keywords Data, Model, Modeling techniques, Big data, Enterprise networks, Databases.
... The most important thing in the process of building a data warehouse is the modeling process [3]. Data Warehouse development success greatly depends on the integration ofassurance qualitydata to all phases of the development of Data Warehouse [15]. ...
... Database is a collection of data organization to serve many applications efficiently by centralizing data and controlling data redundancy [5]. In designing a database, one thing to note Designing Data Warehouse At the Inpatient and Outpatient (Case Study: Sanglah Hospital) Putu Veda Andreyana [1] , Putu Angelina Widya [2] , and Made Suartika [3] [1][2] Department of Electrical and Computer Engineering, Post Graduate Program, Udayana University ...
- Putu Veda Andreyana
- Putu Angelina Widya
- Made Suartika
The growth of patient data increasing the hospital resulted in even harder to compile data and analyze the data manually, so it takes a data warehouse that can perform this task automatically. The design of the data warehouse Sanglah hospital aims to build a data warehouse that can store data in structured and easier to analyze the data to make a decision. Design methods Fact Constellation Schema and method used is a Nine-step methodology consisting of nine stages, namely the Electoral Process, Selection of Grain, identification and adjustment of the dimensions, the Electoral Facts, Storage pre-calculation in the fact table, Ensuring the dimension tables, the Electoral duration database, Track changes of dimensions is slowly, prioritization and query model. Designing Data Warehouse Sanglah Hospital helpful enough for data processing in large enough quantities, so expect the needs and information about the patient can be met. Data warehouse Sanglah Hospital can be used to analyze patient data in order to get information on the number of patients of various dimensions
... At this paper, the data warehouse is designed for the healthcare centres [3,4]. Star is the simplest type of schema, as shown in Figure 1 which contains fact table (Fact Table OPHC) sits in the center and associated with other dimension tables like a star [5,6]. The significant volume of the data warehouses involves an increase of the response time that leads to appear the necessity for using optimization techniques to reduce. ...
The data warehouse size and the query complexity may cause unacceptable delay in decision support queries. A basic condition for the success of a data warehouse is the capability to supply decision makers with both precise information and best response time. For this purpose, the concept of indexed views is used. Indexed views help to speed-up query processing and reduce the response time for tracing queries, especially for queries about past histories.
... A data warehouse is a place to store data with a design that makes analysing data easier, and OLAP is a method to analyse data as well as to provide self-service business intelligence capabilities to decision makers [16]. Microsoft SQL Server -Analysis Services software (SSAS) was used to build the number of OLAP Cubes that meet the main decision makers' queries. ...
This paper presents an architecture for the data warehouse of outpatient healthcare (DWOP) as a data repository collects data from two different sources (Databases of outpatient healthcare and Excel files from hospitals) and provides storage, functionality and responsiveness to queries to meet decision makers requirements.Successfully supporting managerial decision-making is critically dependent upon the availability of integrated, high quality information organized and presented in a timely and easily understood manner. "On-Line Analytical Processing (OLAP) is utilized for decision support to get interesting information" from the data warehouse with a rapid execution time. OLAP is considered one of Business Intelligence tools.
... stages of data modeling Source:Patel and Patel (2012) ...
- Xiaobing Yu
Data warehousing refers to the process of organizing data from different sources to support the process of decision making in organizations as well as support data analytics. Data warehousing assimilates and systematizes data from across different departments in an organization to have a single analysis of the full information. This data is then used to make the decisions in an organization. However, data warehousing has been underestimated in education institutions. The reason is that they are non-profits organizations. With the increase in the number of educational institutions coming up, colleges and universities should consider the integration of such data-driven support systems to make better decisions and for more organization in the academic processes especially for those institutions with different branches. The intricacy in managing these institutions requires development on how consistent information can be relayed to the decision-makers of an institution. This paper explains the need for data warehouses in education institutions and how the teaching institutions can implement data warehousing to help in the management of the institutions. The paper also explores and reviews various methods that institutions can use to mine data. Various stages for promoting and actualizing the system have been shown.
... The most important thing in the process of building a data warehouse is the modeling process [1]. Because designed data model will be measured the extent to which the built-in data warehouse is able to produce the necessary managerial information. ...
- I Putu Ari Putra Wijaya
- Wahyudin Wahyudin
- Made Mataram
Department of Communications and Informatics, appointed as Executor of Handling Complaint Society in Denpasar City. Over time and more complex data complaints that come in so complicate the data analysis, to overcome the data it is necessary implementation of data warehouse. The method used is Snowflake Schema, this method is chosen because this method is the development of star schema where each dimension table can have sub-table dimension. It aims to minimize data overload. From the results of the implementation in get the conclusion that by implementing data warehouse can facilitate the user in view report in accordance with the desired from summary to detail.
... Dimensional data model is preferable to design a data warehouse for company. The dimensional model has good queries performance to support OLAP functions [3]. ...
- Fauziyah Fauziyah
Recently, managing data in electronic manufacturing company has become a challenge.This paper presents the design of a data warehouse based on user needs for electronic manufacturing company. The data warehouse is designed using Kimball's Method. The purpose of the proposed design is to help decision makers in performing data processing and data analysis over the data stored in the warehouse. The data warehouse design based on user needs which provides a data source to support corporate leaders for enhance the decision making process. Database Management System used is Oracle Database XE. This Study includes Integrity check process and User Acceptance Test.This study reflects that user find the data warehouse fulfill the user needs. Keywords– Data Warehouse, Kimball method, Dimensional Model, Integrity Check.
- Rajdeep Chowdhury
- Olive Roy
- Soupayan Datta
- Saswata Dasgupta
A data warehouse is a storehouse which comprises all specifics and statistics of the institution. With the contemporary leaning of hacking beside the initiation of fresh hacking proficiency, the safekeeping for data warehouse has transpired to be an influential portion. The projected work exemplifies how an encryption technique is germane in the internal structure of a data warehouse. Apiece instant, any user endeavor for entrance to the information, the data in encrypted form is furnished to the user. Only an authenticated user would be able to obtain the original form of the data, because a well-defined decryption algorithm is installed at the authentic user's end only. The formulation of the spotless Intelligent Sensor Algorithm and the proposed cryptographic modus operandi ascertain that there would be considerable attenuation of admittance time, with observance for the cosseted transference, enhanced infringement avoidance and output augmentation. The applicable employment of the proposed virtual data warehouse model with improved safekeeping might be in its noteworthy utility in an assortment of institution where accrual of cosseted information is of extreme extent. The assortment of associations is inclusive of edifying institutions, business communities, medicinal associations, classified enterprises and there on.
Estimation Model for Data Warehousing Projects Presented In 2nd Annual International Estimation Colloquium
Estimation Model for Data Warehousing Projects Presented In 2nd Annual International Estimation Colloquium 2007,By Karthikeyan Sankaran & Sujatha Sivaraman,Hexaware Technologies
Posted by: tamishatamishaengelharte0273027.blogspot.com
Source: https://www.researchgate.net/publication/331197296_DATA_MODELING_TECHNIQUES_FOR_DATA_WAREHOUSE