You often need to add a date dimension table to your data mart schema and populate it with appropriate data. Below is a summary of some of the current options that are available for performing this task.

Options on the table

Kettle job

You can find multiple ready to use Kettle transformations for creating date dimensions. Examples include adding detailed properties such as fixed and variable date holidays or custom sport events. Using Kettle is a powerful option if you already use it for other data transformations for populating your data warehouse’s data marts. However, when you just want to create a dataset for one-time analysis it may be an overkill.

Computed date dimensions

If you cannot change the database schema and already have a date column in the table you may consider simply NOT . You can use the degenerate dimension feature of Mondrian. You do it by defining levels of dimension hierarchy as sql functions of the date column i.e.
EXTRACT(timestamp, YEAR)
This approach is highly discouraged by the lead developer of Mondrian, Julian Hyde, who advises that you create a separate traditional date dimension table. Besides possible performance issues, one clear draw-back of this approach is that your date dimension will not be continuous. If you are missing data for particular time periods, those periods won’t show up in your results. Using a fully-populated date dimension will ensure you will be able to retrieve continuous results when date is one of the axis of your results (or one of the filters). However, settling for degenerate dimensions is often a quick, “agile” option for getting started with an analysis.

Built in function of LucidDB

We have been using LucidDB for a couple years to speed-up queries for some of our Business Intelligence deployments. LucidDB comes with a very handy function to generate date dimensions. You can generate date dimension view with one simple call:
create view dim_time as select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3));
(for production you may want to generate table instead of view)

Up-coming built-in function of Mondrian (announced for Mondrian 4)

Julian Hyde has proposed an auto-generated date dimension feature for Mondrian 4. You will be able to define date dimension directly in your schema file and Mondrian will generate and populate the table for you. Sounds good… if you don’t mind Mondrian itself updating your ddl and data. Feature preview is available in lagunitas branch and we cannot wait to see it in stable release.

Your own custom script to generate SQL

When performing ETL tasks to update your data marts, you always have the option to resorting to your favorite scripting language like Groovy, Ruby or Python to generate sql inserts which would populate the date dimensions table. Some ideas from Kettle (Pentaho Data Integration) transformations described earlier could be easily reused.

SQL procedure

Last but not least – you can crate dimensions with SQL procedure. Again, Stack Overflow search returned useful examples of MySQL scripts. We found this solution very flexible and convenient – quick, no need for additional tools, easy customization. Of course procedures need to be modified for different db engines. I will show you how to use it in the next paragraph.

Generating date dimension with single SQL procedure call

To generate date dimension with SQL procedure we started with procedure posted by Roland Bouman at Stack Overflow thread and extended it a bit. Let’s assume we have a basic ‘transactions_cube’ table with ‘created_at’ date column. Date dimensions - plain table First we need to add Roland’s sql procedure to create and update date dimensions. And now we add our extension that will:
  • calculate proper date range based on column value
  • run procedure to create dimension
  • create join column and foreign key
  • update join column values
For your convenience we prepared a script containing all required procedures:
# by Roland Bouman found at: http://stackoverflow.com/a/2149729
# extended by Freeport Metrics http://www.freeportmetrics.com/ - Jakub Piasecki jakub@freeportmetrics.com 
delimiter //

DROP PROCEDURE IF EXISTS p_load_dim_date
//

CREATE PROCEDURE p_load_dim_date (
    p_from_date DATE
,   p_to_date   DATE
)
BEGIN
    DECLARE v_date DATE DEFAULT p_from_date;
    DECLARE v_month tinyint;
    CREATE TABLE IF NOT EXISTS dim_date (
        date_key               int          primary key
    ,   date_value             date
    ,   date_iso               char(10)
    ,   year                   smallint
    ,   quarter                tinyint
    ,   quarter_name           char(2)
    ,   month                  tinyint
    ,   month_name             varchar(10)
    ,   month_abbreviation     varchar(10)
    ,   week                   char(2)
    ,   day_of_month           tinyint
    ,   day_of_year            smallint
    ,   day_of_week            smallint
    ,   day_name               varchar(10)
    ,   day_abbreviation       varchar(10)
    ,   is_weekend             tinyint
    ,   is_weekday             tinyint
    ,   is_today               tinyint
    ,   is_yesterday           tinyint
    ,   is_this_week           tinyint
    ,   is_last_week           tinyint
    ,   is_this_month          tinyint
    ,   is_last_month          tinyint
    ,   is_this_year           tinyint
    ,   is_last_year           tinyint
    );
    WHILE v_date < p_to_date DO
        SET v_month := month(v_date);
        INSERT INTO dim_date(
            date_key
        ,   date_value
        ,   date_iso
        ,   year
        ,   quarter
        ,   quarter_name
        ,   month
        ,   month_name
        ,   month_abbreviation
        ,   week
        ,   day_of_month
        ,   day_of_year
        ,   day_of_week
        ,   day_name
        ,   day_abbreviation
        ,   is_weekend
        ,   is_weekday
        ) VALUES (
            v_date + 0
        ,   v_date
        ,   DATE_FORMAT(v_date, '%y-%c-%d')
        ,   year(v_date)
        ,   ((v_month - 1) DIV 3) + 1
        ,   CONCAT('Q', ((v_month - 1) DIV 3) + 1)
        ,   v_month
        ,   DATE_FORMAT(v_date, '%M')
        ,   DATE_FORMAT(v_date, '%b')
        ,   DATE_FORMAT(v_date, '%u')
        ,   DATE_FORMAT(v_date, '%d')
        ,   DATE_FORMAT(v_date, '%j')
        ,   DATE_FORMAT(v_date, '%w') + 1
        ,   DATE_FORMAT(v_date, '%W')
        ,   DATE_FORMAT(v_date, '%a')
        ,   IF(DATE_FORMAT(v_date, '%w') IN (0,6), 1, 0)
        ,   IF(DATE_FORMAT(v_date, '%w') IN (0,6), 0, 1)
        );
        SET v_date := v_date + INTERVAL 1 DAY;
    END WHILE;
    CALL p_update_dim_date();
END;
//

DROP PROCEDURE IF EXISTS p_update_dim_date;
//

CREATE PROCEDURE p_update_dim_date()
    UPDATE dim_date
    SET    is_today         = IF(date_value = current_date, 1, 0)
    ,      is_yesterday     = IF(date_value = current_date - INTERVAL 1 DAY, 1, 0)
    ,      is_this_week     = IF(year = year(current_date) AND week = DATE_FORMAT(current_date, '%u'), 1, 0)
    ,      is_last_week     = IF(year = year(current_date - INTERVAL 7 DAY) AND week = DATE_FORMAT(current_date - INTERVAL 7 DAY, '%u'), 1, 0)
    ,      is_this_month    = IF(year = year(current_date) AND month = month(current_date), 1, 0)
    ,      is_last_month    = IF(year = year(current_date - INTERVAL 1 MONTH) AND month = month(current_date - INTERVAL 1 MONTH), 1, 0)
    ,      is_this_year     = IF(year = year(current_date), 1, 0)
    ,      is_last_year     = IF(year = year(current_date - INTERVAL 1 YEAR), 1, 0)
    WHERE  is_today
    OR     is_yesterday
    OR     is_this_week
    OR     is_last_week
    OR     is_this_month
    OR     is_last_month
    OR     is_this_year
    OR     is_last_year
    OR     IF(date_value = current_date, 1, 0)
    OR     IF(date_value = current_date - INTERVAL 1 DAY, 1, 0)
    OR     IF(year = year(current_date) AND week = DATE_FORMAT(current_date, '%u'), 1, 0)
    OR     IF(year = year(current_date - INTERVAL 7 DAY) AND week = DATE_FORMAT(current_date - INTERVAL 7 DAY, '%u'), 1, 0)
    OR     IF(year = year(current_date) AND month = month(current_date), 1, 0)
    OR     IF(year = year(current_date - INTERVAL 1 MONTH) AND month = month(current_date - INTERVAL 1 MONTH), 1, 0)
    OR     IF(year = year(current_date), 1, 0)
    OR     IF(year = year(current_date - INTERVAL 1 YEAR), 1, 0)
    ;
//

DROP PROCEDURE IF EXISTS p_load_and_join_date_dimension
//

CREATE PROCEDURE p_load_and_join_date_dimension (
    cube_table varchar(255), 
    date_column varchar(255)
)
BEGIN   


    # create date dimension based on min and max date
    set @create_date_dim_stmt = concat('call p_load_dim_date((select min(',date_column,') from ',cube_table,'),(select date_add(max(',date_column,'), interval 1 day) from ', cube_table,'))');
    prepare stmt from @create_date_dim_stmt;
    execute stmt;
    deallocate prepare stmt;

    # create columns
    set @create_column_stmt = concat('alter table ',cube_table,' add column ', date_column,'_date_dim int');
    prepare stmt from @create_column_stmt;
    execute stmt;
    deallocate prepare stmt;

    # add foreign key
    set @create_fk_stmt = concat('alter table ',cube_table,' add constraint fk_', date_column,'_date_dim foreign key (', date_column,'_date_dim) references dim_date(date_key)');
    prepare stmt from @create_fk_stmt;
    execute stmt;
    deallocate prepare stmt;



    # connect table with date dim   
    set @create_connect_stmt = concat('update ',cube_table,' set ', date_column,'_date_dim=date(', date_column,')+0;');
    prepare stmt from @create_connect_stmt;
    execute stmt;
    deallocate prepare stmt;

END;
//

delimiter ;
(EDIT: 2014-04-28: added faster version of p_load_and_join_date_dimension) Now you need to call procedure with name of the table and date column. As simple as that.
call p_load_and_join_date_dimension('transactions_cube', 'created_at');
Our table has been connected with generated date dimension through newly created ‘created_at_date_dim’ column. Date dimensions - table with dimension All you need to do is to add time/date dimensions to your Mondrian schema definition
<Dimension name="Time" type="TimeDimension">
    <Hierarchy hasAll="true" primaryKey="date_key">
        <Table name="dim_date"/>
        <Level name="Year" column="year" uniqueMembers="true" levelType="TimeYears" type="Numeric"/>
        <Level name="Month" column="month" uniqueMembers="false" ordinalColumn="month" nameColumn="month_name" levelType="TimeMonths" type="Numeric"/>
        <Level name="Day" column="day_of_month" uniqueMembers="false" ordinalColumn="day_of_month" levelType="TimeDays" type="Numeric"/>
    </Hierarchy>
</Dimension>        
...        
<Cube name="Transactions">
    <Table name="transactions_cube"/>
    <DimensionUsage name="Time" source="Time" foreignKey="created_at_date_dim"/>
     ..
</Cube>

Summary

That’s all. You can analyze your data with your new date dimensions schema in the Business Intelligence tool of your choice – depends on your organization’s audience and needs. We often recommend Rubbi 🙂 Obviously, when you decide on your organization’s ETL toolset, you should look for a solution that is going to be maintainable and use as few tools as possible. Settle on using Kettle only, or a set of scripts written in one language, or just use stored procedures after loading the data into your database. Mixing all these techniques could be a huge headache to maintain down the road. The appropriate toolset often depends on the volume of data, the regularity of updates, and your team’s skill set!
blog comments powered by Disqus