Column= field
Row= record
1. The Mumbai university wants you to help design a star schema to record grades for course completed by students. There are four dimensional
tables namely course_section, professor, student, period with attributes as follow:
Course_section Attributes: Course_ID, Section_number, Course_name, Units, Room_id, Roomcapacity.
During a given semester the college offers an average of 500 course sections.
Professor Attributes: Prof_id, Prof_name, Title, Department_id, Department_name
Student Attributes: Student_id, Student_name, Major.
Each course section has an average of 60 students.
Period Attributes: Semester_id, Year.
The database will contain data for 30 months periods. The only fact that is to be recorded in the fact table is course grade.
Answer the following questions
a. Design the star schema for this problem.
b. Estimate the number of rows in the fact table, using the assumptions stated above and also estimate the total size of the fact table ( in
bytes) assuming that each filed has an average of 5 bytes.
c. Can you convert this star schema to a snowflake schema? Justify your answer and design a snowflake schema if it is possible.
or
10. The college wants to record the grades for the courses completed by students. There are four dimensions:
i) Course ii) Professor iii) Student iv) Period
The only fact that is to be recorded in the table is course-grade
i) Design star schema ii) Write DMQL for the above star schema.
1
2. Give information package for recording information requirements for "Hotel Occupancy" considering dimensions like Time, Hotel etc.
Design star schema from the information package.
Hotel Room Type Time
Hotel id Room id Time id
Branch Name Room type Year
Branch Code Room Size Quarter
Region Number of beds Month
Address Type of bed Date
City/state/zip Max occupants Day of week
Construction year Suite Day of month
Renovation Year Holiday flag
Facts:
a) Occupied Rooms b) Vacant Rooms c) Unavailable Rooms
d) No. of occupants e) Revenue
Draw the Star Schema
3. For a Supermarket Chain, consider the following dimensions, namely Product, store, time, promotion. The schema contains a central fact
tables sales facts with three measures unit_sales, dollars_sales and dollar_cost.
Design star schema and calculate the maximum number of base fact table records for the values given below.
2
4. All electronics company have department. Sales consider for dimensions namely time, item, branch & location. The schema contain central
fact tables sales with two measures dollars_sold and unit_sold. Design star schema, snowflake schema and fact constellation for same
5. A manufacturing company has a huge sales network. To control the sales it is divided in the regions. Each region has multiple zones. Each
zone has different cities. Each sales person is allocated different cities. The object is to track sales figure at different granularity levels of region.
Also to count no. of products sold. Create data warehouse schema to take into consideration of above granularity levels for region, sales person
and the quarterly, yearly and monthly sales.
Sales Fact Table
Facts: sales_amt, no_of_products_sold
Time dimension Time_id Month Quarter Year
Product dimension Prod_id Prod_name Product_brand Product_category
Location dimension Loc_id City Zone Region
Salesperson dimension Sales_id Name Address Dept
6) Suppose that a data warehouse consists of the three dimensions time, doctor and patient and the two measures count and charge, where charge
is the fee that a doctor charges a patient for a visit.
7) Suppose that a data warehouse contains three dimensions date, doctor and patient. There is only measure – charge where charge is the fee that
a doctor charges to a patient for a visit. Design a star schema for the data warehouse, assuming some concept hierarchy for each dimension.
3
Starting with the base cuboid [date, doctor, patient], which sequence of OLAP operations do you need to list the total fee collected by each doctor
in the year 2002?
8) Suppose that a data warehouse consists of the four dimensions, date, spectator, location, and game, and the two measures, count and charge,
where charge is the fare that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each
category having its own charge rate.
Draw a star schema diagram for the data warehouse.
9. A bank wants to develop a data warehouse for effective decision-making about their loan schemes. the bank provides loans to customers for
various purposes like House Building loan, car loan, education loan, personal loan etc. The whole country is categorized into a number of
regions, namely North, South, East, West. Each region consists of a set of states; loan is disbursed to customers at interest rates that changes from
time to time. Also, at any given point of time, the different types of loans have different rates. That data warehouse should record an entry for
each disbursement of loan to customer. With respect to the above business scenario
i) Design an information package diagram. Clearly explain all aspects of the diagram.
ii) Design a star schema for the data warehouse clearly identifying the fact table(s), dimensional table(s), their attributes and measures along with
the primary key and foreign key relationships.
iii) Write an SQL query by which you can display region-wise, bank-wise, year-wise total amount of loans disbursed from your schema.
iv) Starting with the base cuboid, if we want to see the amount of loan disbursed during the year 2000 for the state of Maharashtra, which
sequence of OLAP operations would you need to perform?
4
Bank Loan Fact Table
Facts: loan amt, payment amt
Time dimension Time_id Day Date Month Quarter Year
Customer dimension Cust_id Acc_number Acc_type Age Address
Branch dimension Branch_id Branch_name Branch_add
Location dimension Loc_id Name City State Zone Region
Loan dimension Loan_id Loan types Loan duration Interest rate Loan desc
For snowflake: Loan_type
5
10. An insurance company, with branches all over the country, wants to develop a data warehouse for effective decision-making about their
insurance policies. There are a number of different types of insurance like Auto insurance, Home insurance, Industrial insurance, etc. The
entire country is categorized into four regions, namely, North, South, East and West. Each region consists of a set of states. There may be
different types of customers like individuals, institution, industry, etc. The data warehouse should record an entry for each policy issued to
each customer along with the premium paid.
Create an Information Package diagram and snowflake schema
6
11. Consider the following business scenario. A telecom company plans to maintain a CRM data warehouse. There are 10 million customers of
the company. Besides the usual attributes, the company wants to maintain additional demographic information like literacy percentage,
male/female ratio, average life expectancy and average income of the people belonging to the state to which each customer belongs. The
company also wants to maintain information about the age group, income level and marital status of its customers. They also need to run
queries like the number of married and unmarried customers they have at any point in time.
a. Design an efficient data warehouse schema that satisfies the above business scenario. Clearly identify the fact table(s), dimension table(s),
primary key(s) and foreign key(s).
b. Write an SQL statement that generates the number of married and unmarried customers that the company has today.
7
12. A chain of departmental stores called “India-Mart” having operations only in India, plans to develop a data warehouse for effective decision-
making about their sales and different promotion schemes. India-Mart puts some of their products on promotional sales from time to time.
There may be a large number of different types of promotions like coupon sales, end-of-the-aisle display, buy-two-get-one-free, etc.
India-Mart would like to analyze how item sale is affected by the promotions at each store, in each state and across the entire country.
With respect to the above business scenario, answer the following questions.
a. Design a star schema for the data warehouse clearly identifying the fact table(s), dimension table(s), their attributes and measures along with
the primary key and foreign key relationships.
b. Write an SQL query by which you can display year-wise, promotion-wise, product-wise total sales in the entire country from your schema.
c. Draw a cuboid that would display the result of the query specified in Q. b above.
d. From the cuboid of Q. c above, if we want to find the total amount of promotional sales made during the years 2002 and 2003 for the states of
Karnataka and Maharashtra, which sequence of OLAP operations would you need to perform?
e. Draw possible schema hierarchies for each dimension that you have designed.
f. Based on the schema hierarchies drawn in Q. e above, determine the total number of cuboids, considering all the aggregation levels.
8
13. A university plans to build a data warehouse that would help them in analyzing the performance of the students in various courses in
different academic sessions. They want to analyze if there is any relation between the average grade of a course and the number of students
attending it. They would also like to know if there were some courses offered but did not have any students registered for them. Relative
performance among boys and girls and average grades of students from various states and cities of the country for each course must be
analyzed for each course and also overall CGPA.
(a) Design a star schema for such a data warehouse clearly identifying the fact table(s) and dimension table(s), their primary key(s) and foreign
key(s). Your schema should at least be able to satisfy the above mentioned analysis requirements. You may consider other suitable attributes for
the dimension table(s).
(b) Write an SQL query that runs on your schema and returns the average CGPA of boys from the state of Karnataka for each spring semester
during the years 2002-2005.
9
14)
15) Draw a star schema for Library Management
Book Fact Table
Facts: stud_book_cnt, book_id_cnt
Time dimension Time_id Date Month Quarter Semester Year
Student dimension Stud_id Stud_name Stud_address Gender Age dept
Book dimension Book_id Book_name Book_author Book_IBNNO Publish Type
Issue dimension Issue_id Start_date End_date
16) A hospital cum medical research institute is carrying out a study on the nature of different types of fevers. In order to track every patient as
he/she keeps coming back to the hospital, a unique id is maintained. For each patient, they keep track of the body temperature at every hour of
the day as long as the patient is admitted in the hospital. They also maintain data about the different types of medicine being given to the
patient. Patients may be given more than one medicine in a day. Every medicine is administered as many times in a day as the doctor has
prescribed. Since there is history of different types of fevers occurring in various districts, states and regions in the country, the hospital research
team wants to maintain such residence details of each patient. One of the goals of the research is to determine if there is any relation between the
age and gender of the patients with their body temperature when various medicines are administered. Another goal is to determine if there is a
10
relation between the % of population who are farmers, office goers or teachers in the patient’s state with the body temperature of the patients
when various medicines are administered.
a. Design a suitable schema for the hospital cum medical research institute, clearly identifying the Fact table(s), Dimension Tables(s), the Facts,
the Dimensions, Primary Keys and Foreign Keys of all the tables. Your schema should at least be able to satisfy the above mentioned research
requirements. You may consider other suitable attributes for the dimension table(s).
b. Classify the fact(s) in your fact table(s) as additive, non-additive and semi-additive.
c. Write an SQL query that runs on your schema and returns today’s average, maximum
and minimum body temperature for each married male patient.
d. Draw a cuboid to represent the result of your query.
11
17) An insurance company requires the data warehouse design for accidents analysis of its customers. In particular, the company requires to
evaluate the type of accidents related to customers and the type of policies.
Goal:
Evaluate the history of accidents w.r.t. the policies and the customers
Evaluate the history of policies w.r.t. the customers by considering the risk type and the policy amount
12
18) A very large tele-communications company called “Cell9”, providing cellular phone services to a number of states in various regions of the
country, plans to build a data warehouse for decision support. They have millions of subscribers in the country. They want to track the duration
(in minutes) as well as the prevailing rate (per minute) of each phone call made by its subscribers. They also want to analyze if there is any link
between the total amount of time spent in talking on cellphones by a subscriber and the number of graduates in the state or the number of married
persons in the state or the male-female ratio of the state to which the subscriber belongs. Further, they want to analyse the relation between the
age, salary and marital status of the customers to their total bill amount per day/month/year. One other important requirement is to make queries
like determining the current total number of customers in the various age groups for each state having certain ranges of male-female ratio.
(a) Design a suitable relational database schema for such a data warehouse, clearly identifying the fact table(s), the facts in the fact table(s), the
dimension table(s), their primary key(s) and foreign key(s). Your schema should at least be able to satisfy the above mentioned analysis
requirements. You may consider other suitable attributes for the dimension table(s).
13
(b) Classify the facts in your fact table(s) as additive, non-additive and semi-additive.
(c) Draw possible concept hierarchies for each dimension that you have designed, identifying whether these are schema hierarchies or set
grouping hierarchies.
(d) Write an SQL query that runs on your schema and returns the region-wise yearly average bill amounts of married and unmarried customers.
(e) Draw a cuboid to represent the result of your query.
(f) From this cuboid, which sequence of OLAP operations would you perform to get the average monthly bill amounts of all the customers for
the states of Bihar and West Bengal?
g) Write an SQL query to return the current total number of customers in the various age groups for each state with male-female ratio between
0.9 and 1.1.
(h) For any one fact table (You may have only one, depending on your design), and any one attribute of any one dimension table, draw the
bitmap index table(s) and join index table(s). Before drawing the index tables, first mention the representative rows in the tables
b. Write an SQL statement that generates the number of married and unmarried customers that the company has today.
14
19) Design data warehouse for following company
15
20)
16
17