Granularity
The first step in designing a fact
table is to determine the granularity of the fact table. By granularity,
we mean the lowest level of information that will be stored in the fact table.
This constitutes two steps:
- Determine which dimensions will be included.
- Determine where along the hierarchy of each dimension
the information will be kept.
The determining factors usually goes
back to the requirements.
Which Dimensions To Include
Determining which dimensions to
include is usually a straightforward process, because business processes will
often dictate clearly what are the relevant dimensions.
For example, in an off-line retail
world, the dimensions for a sales fact table are usually time, geography, and
product. This list, however, is by no means a complete list for all off-line
retailers. A supermarket with a Rewards Card program, where customers provide
some personal information in exchange for a rewards card, and the supermarket
would offer lower prices for certain items for customers who present a rewards
card at checkout, will also have the ability to track the customer dimension.
Whether the data warehousing system includes the customer dimension will then
be a decision that needs to be made.
What Level Within Each Dimension To
Include
Determining which part of hierarchy
the information is stored along each dimension is not an exact science. This is
where user requirement (both stated and possibly future) plays a major role.
In the above example, will the
supermarket wanting to do analysis along at the hourly level? (i.e.,
looking at how certain products may sell by different hours of the day.) If so,
it makes sense to use 'hour' as the lowest level of granularity in the time
dimension. If daily analysis is sufficient, then 'day' can be used as the
lowest level of granularity. Since the lower the level of detail, the larger
the data amount in the fact table, the granularity exercise is in essence
figuring out the sweet spot in the tradeoff between detailed level of analysis
and data storage.
Note that sometimes the users will
not specify certain requirements, but based on the industry knowledge, the data
warehousing team may foresee that certain requirements will be forthcoming that
may result in the need of additional details. In such cases, it is prudent for
the data warehousing team to design the fact table such that lower-level
information is included. This will avoid possibly needing to re-design the fact
table in the future. On the other hand, trying to anticipate all future
requirements is an impossible and hence futile exercise, and the data
warehousing team needs to fight the urge of the "dumping the lowest level
of detail into the data warehouse" symptom, and only includes what is
practically needed. Sometimes this can be more of an art than science, and
prior experience will become invaluable here.