The conceptual Entity-Relationship (ER) is extensively used for database design in relational database environment, which emphasized on day-today operations. Multidimensional (MD) data modeling, on the other hand, is crucial in data warehouse design, which targeted for managerial decision support. It supports decision making by allowing users to drill-down for a more detailed information, roll-up to view summarized information, slice and dice a dimension for a selection of a specific item of interest and pivot to re-orientate the view of MD data. When designing a MD model regardless whether it is a star or snowflake schema, it involves the identification of a fact, dimensions and measure attributes. This paper will explore on how the Multidimensional model can be used as the yardstick of data warehouse design instead of ER Model.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

DATA MODELING TECHNIQUES FOR DATA WAREHOUSE

MS. ALPA R. PATEL*; PRO F. (DR.) JAYESH M. PATEL**

*R. K. Achchhariwala College of Computer & Applied Sciences,

Vapi, Dist.-Valsad (South Gujarat),

Veer Narmad South Gujarat University, Surat.

**Gujarat Technological University,

Sankalchand Patel College of Engineering,

Sankalchand Patel Sahakar Vidhyadham,

Visnagar, Dist. Mehasana, North Gujarat.

ABSTRACT

The conceptual Entity-Relationship (ER) is extensively used for database design in relational

database environment, which emphasized on day-to-day operations. Multidimensional (MD)

data modeling, on the other hand, is crucial in data warehouse design, which targeted for

managerial decision support. It supports decision making by allowing users to drill-down for

a more detailed information, roll-up to view summarized information, slice and dice a

dimension for a selection of a specific item of interest and pivot to re-orientate the view of

MD data. When designing a MD model regardless whether it is a star or snowflake schema, it

involves the identification of a fact, dimensions and measure attributes.

This paper will explore on how the Multidimensional model can be used as the yardstick of

data warehouse design instead of ER Model.

KEYWORDS: Entity-Relationship Model, Multidimensional Model, Fact, Dimensions,

Attributes.

___________________________________________________________________________

INTRODUCTION

1 DATA MODELING TECHNIQUES

Two data modeling techniques that are relevant in a data warehousing environment are ER

modeling and Multidimensional modeling.

ER modeling produces a data model of the specific area of interest, using two basic concepts:

entities and the relationships between those entities. The ER model is an abstraction tool

because it can be used to understand and simplify the ambiguous data relationships in the

business world and complex systems environments.

Multidimensional modeling uses three basic concepts: measures, facts, and dimensions.

Multidimensional modeling is powerful in representing the requirements of the business user

in the context of database tables.

Both ER and Multidimensional modeling can be used to create an abstract model of aspecific

subject.

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

1.1 ER MODELING

An ER model is represented by an ER diagram, which uses three basic graphic

symbols to conceptualize the data: entity, relationship, and attribute.

1.1.1 ENTITY

An entity is defined to be a person, place, thing, or event of interest to the business or

the organization. An entity represents a class of objects, which are things in the real world

that can be observed and classified by their properties and characteristics.

1.1.2 RELATIONSHIP

A relationship is represented with lines drawn between entities. It depicts the structural

interaction and association among the entities in a model. A relationship is designated

grammatically by a verb, such as owns, belongs, and has. The relationship between two

entities can be defined in terms of the cardinality. This is the maximum number of instances

of one entity that are related to a single instance in another table and vice versa. The possible

cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M).

1.1.3 ATTRIBUTES

Attributes describe the characteristics of properties of the entities. For clarification, attribute

naming conventions are very important. An attribute name should be unique in an entity and

should be self-explanatory.

When an instance has no value for an attribute, the minimum cardinality of the attribute is

zero, which means either nullable or optional.

In ER modeling, if the maximum cardinality of an attribute is more than 1, the modeler will

try to normalize the entity and finally elevate the attribute to another entity. Therefore,

normally the maximum cardinality of an attribute is 1.

1.2 MULTIDIMENSIONAL MODELING

In some respects, Multidimensional modeling is simpler, more expressive, and easier to

understand than ER modeling. But, Multidimensional modeling is a relatively new concept

and not firmly defined yet in details, especially when compared to ER modeling techniques.

1.2.1 BASIC CONCEPTS

Multidimensional modeling is a technique for conceptualizing and visualizing data models as

a set of measures that are described by common aspects of the business. It is especially useful

for summarizing and rearranging the data and presenting views of the data to support data

analysis. Multidimensional modeling focuses on numeric data, such as values, counts,

weights, balances, and occurrences

Multidimensional modeling has several basic concepts:

Facts

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

Dimensions

Measures (variables)

1.2.2 FACT

A fact is a collection of related data items, consisting of measures and context data. Each fact

typically represents a business item, a business transaction, or an event that can be used in

analyzing the business or business processes.

In a data warehouse, facts are implemented in the core tables in which all of the numeric data

is stored.

1.2.3 DIMENSION

A dimension is a collection of members or units of the same type of views. In a diagram, a

dimension is usually represented by an axis. In a Multidimensional model, every data point in

the fact table is associated with one and only one member from each of the multiple

dimensions. That is, dimensions determine the contextual background for the facts. Many

analytical processes are used to quantify the impact of dimensions on the facts.

Dimensions are the parameters over which we want to perform Online Analytical Processing

(OLAP). For example, in a database for analyzing all sales of products, common dimensions

could be:

Time

Location/region

Customers

Salesperson

Scenarios such as actual, budgeted, or estimated numbers

DIMENSION MEMBERS: A dimension contains many dimension members. A dimension

member is a distinct name or identifier used to determine a data i

all months, quarters, and years make up a time dimension, and all cities, regions, and

countries make up a geography dimension.

DIMENSION HIERARCHIES: We can arrange the members of a dimension into one or

more hierarchies. Each hierarchy can also have multiple hierarchy levels. Every member of a

dimension does not locate on one hierarchy structure. A good example to consider is the time

dimension hierarchy as shown in Figure 1.

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

FIGURE 1: MULTIPLE HIERARCHIES IN A TIME DIMENSION

1.2.4 MEASURE

A measure is a numeric attribute of a fact, representing the performance or behavior of the

business relative to the dimensions. The actual numbers are called as variables. For example,

measures are the sales in money, the sales volume, the quantity supplied, the supply cost, the

transaction amount, and so forth. A measure is determined by combinations of the members

of the dimensions and is located on facts.

Considering Relational context, there are two basic models that are used in dimensional

modeling: (i) star model and (ii) snowflake model.

The star model is the basic structure for a dimensional model. It has one large central table

(fact table) and a set of smaller tables (dimensions) arranged in a radial pattern around the

central table. (We show an example in Figure 2). The snowflake model is the result of

decomposing one or more of the dimensions. The many-to-one relationships among sets of

attributes of a dimension can separate new dimension tables, forming a hierarchy. (Figure 3

shows an example). The decomposed snowflake structure visualizes the hierarchical structure

of dimensions very well.

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

FIGURE 2: STAR MODEL

FIGURE 3: SNOW FLAKE MODEL

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

3 MULTIDIMENSIONAL MODEL Vs RELATIONSHIP MODEL

ER is a logical design technique that seeks to remove the redundancy in data. This

coupled with normalization of data enables easy maintainability and improves data integrity

which is a necessity for transaction processing applications. End user comprehension and the

data retrieval are major show stoppers; as such a database is proliferated with dozens of

tables that are linked together by a bewildering spider web of joins. Use of the ER modeling

technique defeats the basic allure of data warehousing, namely intuitive and high-

performance retrieval of data.

MD is a logical design technique that seeks to present the data in a standard, intuitive

framework that allows for high-performance access. Every Multidimensional model is

composed of one table with a multipart key, called the fact table, and a set of smaller tables

called dimension tables. Each dimension table has a single-part primary key that corresponds

exactly to one of the components of the multipart key in the fact table. This characteristic

"star-like" structure is often called a star join.

Each dimensional table is logical and user identifiable and serves a business purpose

by serving as an object of interest to the user. It is also maintained by the ETL process of the

data ware housing application .Hence it is considered as an internal Logical file and included

in the data function count.

4 WHY ER IS NOT SUITABLE FOR DATA WAREHOUSES?

End user cannot understand or remember an ER Model. End User cannot navigate an ER

Model. There is no graphical user interface or GUI that takes a general ER diagram and

makes it usable by end users.

ER modeling is not optimized for complex, ad-hoc queries. They are optimized for

repetitive narrow queries

Use of ER modeling technique defeats this basic allure of data warehousing, namely

intuitive and high performance retrieval of data because it leads to highly normalized

relational tables.

5 CONCLUSION

In this paper, we examine that an E-R structured data warehouse, absent associative entities,

i.e. fact tables, is a not viable concept given recent developments in data warehousing.

A number of conclusions are supported by the arguments.

Not every E-R model can be represented as a set of star schemas containing

equivalent information

But every properly constructed E-R data warehousing model can be so represented

Many E-R data warehouse models are not properly constructed in that they don't

explicitly recognize many-many relations and the need to resolve them with

associative entities, i.e. fact tables.

ZENITH

International Journal of Multidisciplinary Research

Vol.2 Issue 2, February 2012, ISSN 2231 5780

To use data warehousing E-R models specifying atomic data dependency

relationships without fact tables is to ensure poor query response performance in large

databases, and therefore discourage, and often prevent, execution of a multi-stage

analysis process. In effect, it is to make the data warehouse no more than a big staging

area for data marts, with no independent analytical function of its own.

Given the development of ODSs and non-queryable centralized staging areas for

storing, extracted, cleansed, and transformed data and for gathering centralized

metadata.

We don't need another non-queryable staging area called a data warehouse. What we

do need, instead, is a dimensionally modeled data warehouse for enterprisewide DSS,

prepared to provide the best in query response performance and to support the most

advanced OLAP functionality we can devise.

REFERENCES

1) A Conceptual Model for Multidimensional Data By Anand S. Kamble, Department of

Information Technology, Government of India, New Delhi, India

2) An Overview of Data Warehouse Design Approaches and Techniques, Alejandro

Gutiérrez, Adriana Marotta Instituto de Computación, Facultad de Ingenieria,

Universidad de la República, Montevideo, Uruguay October 2000

3) A Data Model for Supporting On-Line Analytical

Fifth International Conference on Information and Knowledge

Management, 1996, pp. 81 88.

4) Conceptual Multidimensional Model, By Manpreet Singh, Parvinder Singh, and

Suman, World Academy of Science, Engineering and Technology 36 2007

5) Data Modeling Techniques for Data Warehousing By Chuck Ballard, Dirk Herreman,

Don Schau, Rhonda Bell, Eunsaeng Kim, Ann Valencic

6) Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M.

Firestone, Ph.D. White Paper No. Eight, June 22, 1998

7) Dimensional Model Data Warehouse: An Overview,

Insurance Company, Novato, CA

8) Estimation Model for Data Warehousing Projects Presented In 2nd Annual

International Estimation Colloquium 2007,By Karthikeyan Sankaran & Sujatha

Sivaraman ,Hexaware Technologies

9) Kimball, Ralph, The Data Warehouse Toolkit: The Complete Guide to Dimensional

Modeling, New York, NY: John Wiley and Sons, Inc., 2002. 436pp.

10) Len Silverston, W. H. Inmon, and Kent Graziano, The Data Model Resource Book

(New York, NY: John Wiley & Sons, Inc., 1997).

... Database management systems have become ubiquitous and are fundamental tools in the management of information. Therefore, data modeling is an integral part of database management systems [1,8]. ...

... The data model can be viewed as concepts that describe the structure of a databasestructure of how data is held. Database management systems can be based on a relational, non-relational, network, hierarchical data models amongst others [1], [8]. ...

The deployment and maintenance of enterprise networks form the bedrock of any organization, be it government, commercial, academic, and/or non-profit making. These networks host vast amounts of information, databases, in either temporary mode while in transit or permanent mode while stationary. The databases are managed by the information systems with appropriate functions that meet consumers' needs. Databases hold varying data – structured, semi-structured, or unstructured. Data is increasingly becoming a vital organizational asset and therefore plays a crucial role in making organizational decisions. With growth in the internet, digital data sources have become ubiquitous. In turn, this has seen the continued growth in the volume, variety, veracity, velocity, and value of data. Big data brings with its data complexities that have an eventual impact on the data modeling techniques. This paper presents a review of big data modeling techniques with a concentration of enterprise networks. We started by appreciating big data before embarking on modeling techniques for big data. Keywords Data, Model, Modeling techniques, Big data, Enterprise networks, Databases.

... The most important thing in the process of building a data warehouse is the modeling process [3]. Data Warehouse development success greatly depends on the integration ofassurance qualitydata to all phases of the development of Data Warehouse [15]. ...

... Database is a collection of data organization to serve many applications efficiently by centralizing data and controlling data redundancy [5]. In designing a database, one thing to note Designing Data Warehouse At the Inpatient and Outpatient (Case Study: Sanglah Hospital) Putu Veda Andreyana [1] , Putu Angelina Widya [2] , and Made Suartika [3] [1][2] Department of Electrical and Computer Engineering, Post Graduate Program, Udayana University ...

  • Putu Veda Andreyana
  • Putu Angelina Widya
  • Made Suartika

The growth of patient data increasing the hospital resulted in even harder to compile data and analyze the data manually, so it takes a data warehouse that can perform this task automatically. The design of the data warehouse Sanglah hospital aims to build a data warehouse that can store data in structured and easier to analyze the data to make a decision. Design methods Fact Constellation Schema and method used is a Nine-step methodology consisting of nine stages, namely the Electoral Process, Selection of Grain, identification and adjustment of the dimensions, the Electoral Facts, Storage pre-calculation in the fact table, Ensuring the dimension tables, the Electoral duration database, Track changes of dimensions is slowly, prioritization and query model. Designing Data Warehouse Sanglah Hospital helpful enough for data processing in large enough quantities, so expect the needs and information about the patient can be met. Data warehouse Sanglah Hospital can be used to analyze patient data in order to get information on the number of patients of various dimensions

... At this paper, the data warehouse is designed for the healthcare centres [3,4]. Star is the simplest type of schema, as shown in Figure 1 which contains fact table (Fact Table OPHC) sits in the center and associated with other dimension tables like a star [5,6]. The significant volume of the data warehouses involves an increase of the response time that leads to appear the necessity for using optimization techniques to reduce. ...

The data warehouse size and the query complexity may cause unacceptable delay in decision support queries. A basic condition for the success of a data warehouse is the capability to supply decision makers with both precise information and best response time. For this purpose, the concept of indexed views is used. Indexed views help to speed-up query processing and reduce the response time for tracing queries, especially for queries about past histories.

... A data warehouse is a place to store data with a design that makes analysing data easier, and OLAP is a method to analyse data as well as to provide self-service business intelligence capabilities to decision makers [16]. Microsoft SQL Server -Analysis Services software (SSAS) was used to build the number of OLAP Cubes that meet the main decision makers' queries. ...

This paper presents an architecture for the data warehouse of outpatient healthcare (DWOP) as a data repository collects data from two different sources (Databases of outpatient healthcare and Excel files from hospitals) and provides storage, functionality and responsiveness to queries to meet decision makers requirements.Successfully supporting managerial decision-making is critically dependent upon the availability of integrated, high quality information organized and presented in a timely and easily understood manner. "On-Line Analytical Processing (OLAP) is utilized for decision support to get interesting information" from the data warehouse with a rapid execution time. OLAP is considered one of Business Intelligence tools.

... stages of data modeling Source:Patel and Patel (2012) ...

  • Xiaobing Yu

Data warehousing refers to the process of organizing data from different sources to support the process of decision making in organizations as well as support data analytics. Data warehousing assimilates and systematizes data from across different departments in an organization to have a single analysis of the full information. This data is then used to make the decisions in an organization. However, data warehousing has been underestimated in education institutions. The reason is that they are non-profits organizations. With the increase in the number of educational institutions coming up, colleges and universities should consider the integration of such data-driven support systems to make better decisions and for more organization in the academic processes especially for those institutions with different branches. The intricacy in managing these institutions requires development on how consistent information can be relayed to the decision-makers of an institution. This paper explains the need for data warehouses in education institutions and how the teaching institutions can implement data warehousing to help in the management of the institutions. The paper also explores and reviews various methods that institutions can use to mine data. Various stages for promoting and actualizing the system have been shown.

... The most important thing in the process of building a data warehouse is the modeling process [1]. Because designed data model will be measured the extent to which the built-in data warehouse is able to produce the necessary managerial information. ...

  • I Putu Ari Putra Wijaya
  • Wahyudin Wahyudin Wahyudin Wahyudin
  • Made Mataram

Department of Communications and Informatics, appointed as Executor of Handling Complaint Society in Denpasar City. Over time and more complex data complaints that come in so complicate the data analysis, to overcome the data it is necessary implementation of data warehouse. The method used is Snowflake Schema, this method is chosen because this method is the development of star schema where each dimension table can have sub-table dimension. It aims to minimize data overload. From the results of the implementation in get the conclusion that by implementing data warehouse can facilitate the user in view report in accordance with the desired from summary to detail.

... Dimensional data model is preferable to design a data warehouse for company. The dimensional model has good queries performance to support OLAP functions [3]. ...

  • Fauziyah Fauziyah Fauziyah Fauziyah

Recently, managing data in electronic manufacturing company has become a challenge.This paper presents the design of a data warehouse based on user needs for electronic manufacturing company. The data warehouse is designed using Kimball's Method. The purpose of the proposed design is to help decision makers in performing data processing and data analysis over the data stored in the warehouse. The data warehouse design based on user needs which provides a data source to support corporate leaders for enhance the decision making process. Database Management System used is Oracle Database XE. This Study includes Integrity check process and User Acceptance Test.This study reflects that user find the data warehouse fulfill the user needs. Keywords– Data Warehouse, Kimball method, Dimensional Model, Integrity Check.

  • Rajdeep Chowdhury Rajdeep Chowdhury
  • Olive Roy
  • Soupayan Datta
  • Saswata Dasgupta

A data warehouse is a storehouse which comprises all specifics and statistics of the institution. With the contemporary leaning of hacking beside the initiation of fresh hacking proficiency, the safekeeping for data warehouse has transpired to be an influential portion. The projected work exemplifies how an encryption technique is germane in the internal structure of a data warehouse. Apiece instant, any user endeavor for entrance to the information, the data in encrypted form is furnished to the user. Only an authenticated user would be able to obtain the original form of the data, because a well-defined decryption algorithm is installed at the authentic user's end only. The formulation of the spotless Intelligent Sensor Algorithm and the proposed cryptographic modus operandi ascertain that there would be considerable attenuation of admittance time, with observance for the cosseted transference, enhanced infringement avoidance and output augmentation. The applicable employment of the proposed virtual data warehouse model with improved safekeeping might be in its noteworthy utility in an assortment of institution where accrual of cosseted information is of extreme extent. The assortment of associations is inclusive of edifying institutions, business communities, medicinal associations, classified enterprises and there on.

Estimation Model for Data Warehousing Projects Presented In 2nd Annual International Estimation Colloquium

Estimation Model for Data Warehousing Projects Presented In 2nd Annual International Estimation Colloquium 2007,By Karthikeyan Sankaran & Sujatha Sivaraman,Hexaware Technologies