The "Slowly
Changing Dimension" problem is a common one particular to data
warehousing. In a nutshell, this applies to cases where the attribute for a
record varies over time. We give an example below:
- Vinay
is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the
original entry in the customer lookup table has the following record:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
Illinois
|
· At
a later date, she moved to Los Angeles, California on January, 2003. How should
ABC Inc. now modify its customer table to reflect this change? This is the
"Slowly Changing Dimension" problem.
·
There
are in general three ways to solve this type of problem, and they are
categorized as follows:
Type
1
- The new record replaces the original record. No trace of the old record
exists.
Type
2
- A new record is added into the customer dimension table. Therefore, the
customer is treated essentially as two people.
Type
3
- The original record is modified to reflect the change.
We
next take a look at each of the scenarios and how the data model and the data
looks like for each of them. Finally, we compare and contrast among the three
alternatives.
Type 1 Slowly
Changing Dimension
· In
Type 1 Slowly Changing Dimension, the new information simply overwrites the
original information. In other words, no history is kept.
·
In
our example, recall we originally have the following table:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
Illinois
|
·
After
Vinay moved from Illinois to California, the new information replaces the new
record, and we have the following table:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
California
|
Advantages:
· This
is the easiest way to handle the Slowly Changing Dimension problem, since there
is no need to keep track of the old information.
Disadvantages:
· All
history is lost. By applying this methodology, it is not possible to trace back
in history. For example, in this case, the company would not be able to know
that Vinay lived in Illinois before.
·
Usage:
·
About
50% of the time.
·
When to use Type 1:
·
Type
1 slowly changing dimension should be used when it is not necessary for the
data warehouse to keep track of historical changes.
Type 2 Slowly
Changing Dimension
In
Type 2 Slowly Changing Dimension, a new record is added to the table to
represent the new information. Therefore, both the original and the new record
will be present. The new record gets its own primary key.
·
In
our example, recall we originally have the following table:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
Illinois
|
· After
Vinay moved from Illinois to California, we add the new information as a new
row into the table:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
Illinois
|
1005
|
Vinay
|
California
|
Advantages:
- This allows us to
accurately keep all historical information.
Disadvantages:
- This
will cause the size of the table to grow fast. In cases where the number of
rows for the table is very high to start with, storage and performance can
become a concern.
- This
necessarily complicates the ETL process.
- Usage:
- About
50% of the time.
- When to use Type 2:
- Type
2 slowly changing dimension should be used when it is necessary for the data
warehouse to track historical changes.
Type
3 Slowly Changing Dimension
In Type 3 Slowly Changing Dimension,
there will be two columns to indicate the particular attribute of interest, one
indicating the original value, and one indicating the current value. There will
also be a column that indicates when the current value becomes active.
In our example, recall we originally
have the following table:
Customer
Key
|
Name
|
State
|
1001
|
Vinay
|
Illinois
|
To accommodate Type 3 Slowly Changing
Dimension, we will now have the following columns:
·
Customer
Key
·
Name
·
Original
State
·
Current
State
·
Effective
Date
After Vinay moved from Illinois to
California, the original information gets updated, and we have the following
table (assuming the effective date of change is January 15, 2003):
Customer
Key
|
Name
|
Original
State
|
Current
State
|
Effective
Date
|
1001
|
Vinay
|
Illinois
|
California
|
15-JAN-2003
|
Advantages:
- This does not increase the size of
the table, since new information is updated.
- This allows us to keep some part of
history.
Disadvantages:
- Type 3 will not be able to keep all
history where an attribute is changed more than once. For example, if Vinay
later moves to Texas on December 15, 2003, the California information will be
lost.
Usage:
- Type 3 is rarely used in actual
practice.
When to use Type 3:
Type III slowly changing dimension
should only be used when it is necessary for the data warehouse to track
historical changes, and when such changes will only occur for a finite number
of time.