Wednesday, March 21, 2012

Is this design right?

I am modeling costs and revenues for Sales. I am having doubts about my approach.

SCENARIO:

We buy in most of our products and sell them on. Some products we make ourselves. Bought-in products can be sourced as items from various suppliers, with each supplier item being uniquely identified.

When analysing purchase costs, users want to drill down efficiently from Product Categories, through Products and down to the costs for the individual Supplier Items that we resell. A supplier item can be used for one sales product only.

Revenues can only ever be analysed at Category and Product level.

SOLUTION:

Tables as follows:

DIM_SupplierItem - contains all supplier items, together with product, category, and supplier groupings enabling hierarchies: Category-Product-SupplierItem and Supplier-SupplierItem.

DIM_Product - contains all products, also with category groupings, enabling the Category-Product hierarchy.

FACT_Costs - contains ProductKey, SupplierItemKey, Cost

FACT_Sales - contains ProductKey, Qty, Price

Prior to being loaded into to the warehouse, data is staged into a single product table, ensuring one source is used to populate the producy and category information included in both dimensions using the same keys and descriptions.

RATIONALE:

Providing a natural hierarchy within the Supplier Item dimension allows for simple, efficient drill down on costs. Ideally I would like to join the supplier item dimension to Sales measure group at the Product grain and do away with the Product dimension. But how can I? Because not all products have a supplier item, I would have sales in my Fact table with no corresponding product in the Supplier Item dimension. Only DIM_Product is guaranteed to include all products.

By including both a product key and a supplier item key on the Costs fact, I am allowing users to drill down to supplier items through products, whilst also supporting a conformed product dimension that allows users to compare sales and costs for products side-by-side when constraining using DIM_Product.

The conclusion is, I can't think of a better solution, but it feels wrong duplicating a subset of the categories and Products in DIM_Product into DIM_SupplierItem.

Any thoughts?

I think you are on the right track, but I'd suggest one change to your design...

For your DIM_SupplierItem dimension table, consider the DIM_Product table to be a part of a snowflaked design. Thus, instead of duplicating product and category information in the DIM_SupplierItem table, just include a foreign key to the DIM_Product dimension table. When you construct your SupplierItem dimension in SSAS, just use both tables as the source for the dimension and construct the attributes and hierarchies as you would if the product and category information was all contained within a single table. Then, just create your Product dimension using the DIM_Product table as you are already planning on doing.

When you create your cube, you should then be able to add your Costs measure group and relate it to SupplierItem and Product dimensions as planned. And add your Sales measure group and relate it to the Product dimension as planned.

This will keep you from duplicating the product and category information across two different tables in your star-schema database. It will, however, still duplicate the data within the two different dimension structures in SSAS, but that is typically not considered as much of an issue as duplicating the data in the star-schema is...

HTH,

Dave Fackler

1 comment:

Anonymous said...

Hi, cool post. I have been thinking about this topic,so thanks for sharing. I will probably be subscribing to your blog. Keep up great writing!!!
1999 Chrysler LHS AC Compressor

Post a Comment