Architecture   and   Functionality     
          Pentaho and OBIEE High-Level Architectures  




Architecture Comparison


The high-level architectures of the Pentaho and OBIEE BI stacks are shown in Figure 1 below:

      Figure 1: Pentaho-OBIEE High-Level Architectures


In terms of overall functionality – in terms of the number of BI stack components supported – Pentaho comes in well ahead of OBIEE. As we’ve seen in the preceding article on TCO, Pentaho costs “a great deal less” than OBIEE; but, it’s also the case that you get “a great deal more for a great deal less”.



Bundled ETL – No 3rd Party Licence Costs


Pentaho comes complete with a substantial ETL offering, whereas OBIEE does not (OBIEE sites will typically be using Informatica for ETL, or, perhaps, ODI). The ETL component comprises one third to one half of the total BI stack architecture in terms of the functionality that it affords.


By bundling an ETL component, Pentaho not only reduces the licence cost still further, but, more importantly, it is able to offer a far more integrated BI suite, one that supports very rapid prototyping and delivery into production, via mechanisms such as the Pentaho Data Service and Pentaho Agile BI.



In-Memory OLAP Engine – High Performance


While Pentaho shares with OBIEE a Mapping Engine (RPD / BI Server), it has in addition an in-memory OLAP engine. This engine comprises Java-based MDX processing software, backed by an in-memory data grid that can span a server cluster. While the data grid must be loaded initially from the underlying relational database, once OLAP segments have been loaded they can be combined, aggregated, and served up more quickly than by using OBIEE’s disk-based cache (the average access time to the data grid is about one millisecond). This in-memory data grid offers substantial additional functionality “out-of-the-box” (the nearest equivalent in terms of an Oracle product would be the Hyperion software embedded within Oracle Exalytics).



Metadata Model Segmentation – Risk Reduction


The next major difference in architecture comes down to metamodels. OBIEE is blighted by its reliance on a single metamodel, the RPD. This dependency often leads to a monolithic RPD containing thousands of business items, and the consequent reluctance on the part of IT to make changes in case existing functionality is compromised. Pentaho supports multiple independent metamodels, each of which can be active in memory at the same time, allowing new functionality to be added in a risk free manner (indeed, power users can create their own wizard-driven metamodels in production with the same ease with which they can create ad hoc reports – offloading the burden from IT when it comes to the simpler data discovery tasks, such as analysing Excel data).



Reporting off ETL Transformations – Fast Delivery


The final major difference in architecture results from Pentaho’s facility to use the individual steps in an ETL transformation as “virtual tables”, with the data being cached in memory to improve performance; these “virtual tables” can be used directly by a report in the same way as any other database table, with the transformation being executed dynamically when the report is run. This facility ensures that:


*  An additional tactical reporting solution, such as QilkView or SAS, is no longer required, and that its Pentaho replacement is both more functional and more performant;


*  Report functionality can be delivered very rapidly into production (no need to build datamarts or metamodels);


*  This alternative tactical Pentaho reporting solution can be enhanced to form a strategic Pentaho reporting solution, without starting again from scratch (add datamart tables and a metamodel; reuse the ETL and the reports);


*  The limited data federation functionality present in the OBIEE RPD / BI Server is replaced by the much more functional and performant Pentaho enterprise ETL engine;


*  Reporting performance is improved as transformed data can be held in the ETL engine’s memory, resulting in an additional in-memory cache;


*  The ETL server cluster becomes a more cost-effective asset as it is now fully utilized during the day – for federating datamart data – as well as during the night – for data extraction from transactional datasources.