Monday, January 21, 2013

Parent/child dimensions in Jasper Analysis

Here I want to share my experience of using a dimension based on tree-like table. As database engine I used Oracle.

As known tree-like table is a table that has pair of columns, usually called as ID and PARENT_ID. Values of PARENT_ID column in children rows reference to values of ID column of their parents. Usually tree's element without any parent is called root, an element that has children is called node and an element without any children is called leaf.

As example I created tree-like table ITEM
CREATE TABLE item (
  id        NUMBER NOT NULL
, parent_id NUMBER
, name      VARCHAR2(32)
, constraint item_pk primary key (id)
, constraint item_parent_fk foreign key (parent_id) references item(id)
)

and filled it by the following data
BEGIN
  INSERT INTO item (id, parent_id, name) VALUES (1, NULL, 'Food');
  INSERT INTO item (id, parent_id, name) VALUES (2, 1, 'Vegetables');
  INSERT INTO item (id, parent_id, name) VALUES (3, 1, 'Fruits');
  INSERT INTO item (id, parent_id, name) VALUES (4, 1, 'Meat');
  INSERT INTO item (id, parent_id, name) VALUES (5, 2, 'Potato');
  INSERT INTO item (id, parent_id, name) VALUES (6, 2, 'Cabbage');
  INSERT INTO item (id, parent_id, name) VALUES (7, 2, 'Beet');
  INSERT INTO item (id, parent_id, name) VALUES (8, 3, 'Apple');
  INSERT INTO item (id, parent_id, name) VALUES (9, 3, 'Cherry');
  INSERT INTO item (id, parent_id, name) VALUES (10, 3, 'Orange');
  INSERT INTO item (id, parent_id, name) VALUES (11, 4, 'Pork');
  INSERT INTO item (id, parent_id, name) VALUES (12, 4, 'Beef');
  INSERT INTO item (id, parent_id, name) VALUES (13, 4, 'Fowl');
  INSERT INTO item (id, parent_id, name) VALUES (14, 13, 'Chicken');
  INSERT INTO item (id, parent_id, name) VALUES (15, 13, 'Turkey');
  INSERT INTO item (id, parent_id, name) VALUES (16, 13, 'Duck''s flesh');
  INSERT INTO item (id, parent_id, name) VALUES (17, NULL, 'Entertainment');
  INSERT INTO item (id, parent_id, name) VALUES (18, 17, 'Cinema');
  INSERT INTO item (id, parent_id, name) VALUES (19, 17, 'Attraction');
  INSERT INTO item (id, parent_id, name) VALUES (20, NULL, 'Payments');
  INSERT INTO item (id, parent_id, name) VALUES (21, 20, 'Municipal Services');
  INSERT INTO item (id, parent_id, name) VALUES (22, 20, 'Other Services');
  INSERT INTO item (id, parent_id, name) VALUES (23, 21, 'Water');
  INSERT INTO item (id, parent_id, name) VALUES (24, 21, 'Electric power');
  INSERT INTO item (id, parent_id, name) VALUES (25, 21, 'Gas');
  INSERT INTO item (id, parent_id, name) VALUES (26, 22, 'Internet');
  INSERT INTO item (id, parent_id, name) VALUES (27, 22, 'TV');
 
  COMMIT;
END;

These are made-up items of home budget. As a tree it's look like this
    Food
    Vegetables
      Potato
      Cabbage
      Beet
    Fruits
      Apple
      Cherry
      Orange
    Meat
      Pork
      Beef
      Fowl
        Chicken
        Turkey
        Duck's flesh
  Entertainment
    Cinema
    Attraction
  Payments
    Municipal Services
      Water
      Electric power
      Gas
    Other Services
      Internet
      TV

I want to use ITEM table as dimension in OLAP view. So one more thing I have to do it's to create fact table.

I created ITEM_EXPENCES table with expences by items of home budget for some period
CREATE TABLE item_expense (
  item_id  NUMBER NOT NULL
, amount   NUMBER(18,2) NOT NULL
, constraint item_expences_pk primary key (item_id)
, constraint item_fk foreign key (item_id) references item(id)
)

and filled it by the following data
BEGIN
  INSERT INTO item_expense (item_id, amount) VALUES (5, 15.00);
  INSERT INTO item_expense (item_id, amount) VALUES (6, 36.37);
  INSERT INTO item_expense (item_id, amount) VALUES (7, 9.04);
  INSERT INTO item_expense (item_id, amount) VALUES (8, 70.00);
  INSERT INTO item_expense (item_id, amount) VALUES (9, 58.30);
  INSERT INTO item_expense (item_id, amount) VALUES (10, 25.00);
  INSERT INTO item_expense (item_id, amount) VALUES (11, 96.33);
  INSERT INTO item_expense (item_id, amount) VALUES (12, 72.07);
  INSERT INTO item_expense (item_id, amount) VALUES (13, 48.00);
  INSERT INTO item_expense (item_id, amount) VALUES (14, 56.90);
  INSERT INTO item_expense (item_id, amount) VALUES (18, 264.70);
  INSERT INTO item_expense (item_id, amount) VALUES (19, 105.00);
  INSERT INTO item_expense (item_id, amount) VALUES (23, 500.00);
  INSERT INTO item_expense (item_id, amount) VALUES (24, 600.00);
  INSERT INTO item_expense (item_id, amount) VALUES (25, 150.80);
  INSERT INTO item_expense (item_id, amount) VALUES (26, 200.00);
  INSERT INTO item_expense (item_id, amount) VALUES (27, 100.00);
 
  COMMIT;
END;

I inserted in data for tree's leaves only. Amount of nodes will be calculated automatically in OLAP view.

Then when all test data had been prepared it's time to design schema file.

Note

I assume you can create JDBC Data Source, Schema File, Mondrian Connection and OLAP View in JasperReports Server to make the example work.

Jasper Analysis (Pentaho Mondrian behind the scenes) declares just simple way to configure parent/child dimension.

All it needs to define name of column with parents' values in parentColumn attribute of Level schema's element. And uniqueMembers attribute has to be set to "true" - it's required for parent/child level.

<?xml version="1.0" encoding = "UTF-8"?>
 
<Schema name="item_schema">
  <Cube name="Item Expenses">
    <Table name="ITEM_EXPENSE" alias="ie" />
 
    <Dimension name="Item" foreignKey="ITEM_ID" >
      <Hierarchy hasAll="true" allMemberCaption="Items" primaryKey="ID" >
        <Table name="ITEM" alias="i" />
        <Level name="Item" uniqueMembers="true" column="ID" nameColumn="NAME" type="Numeric" parentColumn="PARENT_ID" />
      </Hierarchy>
    </Dimension>
 
    <Measure name="Expenses" column="AMOUNT" aggregator="sum" formatString="#,###.00" />
 
  </Cube> 
</Schema>

Also if you use another value than NULL in PARENT_ID for root elements you have to define nullParentValue attribute of Level schema's element.

For example, if value is zero
<Level name="Item" uniqueMembers="true" column="ID" nameColumn="NAME" type="Numeric" parentColumn="PARENT_ID" nullParentValue="0" />

Unfortunately it does not work, despite of this is declared in Mondrian documentation. If I run the OLAP view based on the schema above I will get the following error
java.lang.ClassCastException: mondrian.rolap.SqlMemberSource$RolapParentChildMemberNoClosure cannot be cast to mondrian.rolap.RolapCalculatedMember at mondrian.rolap.RolapCell.replaceTrivialCalcMember(RolapCell.java:231)...
It seems like this unresolved bug.

I think it's because the Mondrian developers recommend use separate closure table to organize parent/child dimension. Closure table let increase the performance and decide some other minor issues.

Ok, what is closure table? It has to contain all possible combinations of ID and PARENT_ID columns' values for each element of tree, from the current element to respective root element.

Let's see on the example. I take 'Fowl' element. The closure table should contain the following pairs
Fowl - Fowl
Fowl - Meat
Fowl - Food
Or, for 'Gas' element, the pairs should be
Gas - Gas
Gas - Municipal Services
Gas - Payments

So I created ITEM_CLOSURE table
CREATE TABLE item_closure (
  id    NUMBER
, parent_id  NUMBER NOT NULL
, constraint item_closure UNIQUE (id, parent_id) using INDEX
);
 
CREATE INDEX id_idx ON item_closure (id);

I also created two indexes as recommended for faster access to closure table's pairs.

Population of ITEM_CLOSURE with the help Oracle SQL features for building hierarchical queries is simple enough
INSERT INTO item_closure 
SELECT i.id
     , connect_by_root i.id AS parent_id
  FROM item i
 CONNECT BY PRIOR i.id = i.parent_id

Note

As shown in Mondrian documentation closure table has one more optional column - distance. It helps to populate the table using stored procedure as described for MySQL in the next section of documentation. Using Oracle we can omit this column.

Finally I included link to created closure table into schema file
<?xml version="1.0" encoding = "UTF-8"?>
 
<Schema name="item_schema">
  <Cube name="Item Expenses">
    <Table name="ITEM_EXPENSE" alias="ie" />
 
    <Dimension name="Item" foreignKey="ITEM_ID" >
      <Hierarchy hasAll="true" allMemberCaption="Items" primaryKey="ID" >
        <Table name="ITEM" alias="i" />
        <Level name="Item" uniqueMembers="true" column="ID" nameColumn="NAME" type="Numeric" parentColumn="PARENT_ID" >
          <Closure parentColumn="PARENT_ID" childColumn="ID">
            <Table name="ITEM_CLOSURE" />    
          </Closure>  
        </Level>
      </Hierarchy>
    </Dimension>
 
    <Measure name="Expenses" column="AMOUNT" aggregator="sum" formatString="#,###.00" />
 
  </Cube> 
</Schema>

Note

Don't declare alias attribute for Table element inside Closure element. I did it habitually and got strange exception that ITEM_CLOSURE table can not be found.

In Closure schema's element you have to define name of columns with parent and child id's values in parentColumn and childColumn attributes respectively. And all is in readiness. The view looks as pictured
Tree-like OLAP view

As for me it's a pity that Closure element supports Table element only inside itself. I expected to use View element too. Just to have a query instead of regular table. But now if any data has been changed we should repopulate closure table.

All have been tested on JasperServer 4.7.0, Oracle Database 11g Release 11.2.0.1.0

No comments:

Post a Comment