BI Suite   Performance        
       
          Why are we waiting?  

 

 

 

Introduction

 

The amount of quantitative information available on Pentaho performance is somewhat limited. For more information see the article Pentaho Performance.

 

 

Reporting

 

When it comes to reporting, Pentaho makes use of the metadata mapping engine approach used by OBIEE and, in addition, a ROLAP engine backed by an in-memory data cache (Mondrian):

 

*  Pentaho’s in-memory data cache should significantly outperform OBIEE’s disk-based cache, leading to sub-second query response times for queries that can be satisfied from the cache; and, if not, response times should be comparable to those currently obtained with OBIEE.

 

*  Pentaho’s metadata mapping engine should offer similar performance to OBIEE for a typical star schema and the same backend database; however, unlike OBIEE, Pentaho can make use of memory in its ETL servers as an additional data cache when data is sourced from an ETL transformation, which would improve performance significantly (during the day the ETL servers can be pointed at the datamarts used for reporting).

 

 

ETL

 

Unlike OBIEE, Pentaho offers a complete BI stack that includes an ETL engine:

 

*  In a clustered implementation, there is good evidence (based on input data volumes up to 300 GB and cluster sizes up to 40 nodes on Amazon EC2 machines) that Pentaho’s ETL will scale in a linear manner with data volumes, and in a near-linear manner with the number of server nodes (Pentaho asserts that it also scales based on the total number of cores in a server cluster, not just on the total number of server nodes).

 

*  In an ETL performance benchmark, The Power of Pentaho and Hadoop in Action, conducted using a 129 node Cloudera Hadoop cluster, deployed on Amazon EC2 machines, Pentaho demonstrated a near constant processing rate of about one million rows per second, over four data volumes, ranging from about 0.5 to 4.0 TB (about 3 to 24 billion rows).

 

*  Pentaho’s ETL, being I/O bound for substantial input data volumes (by about 95%), is likely to have similar performance characteristics to that offered by other ETL vendors given comparable hardware.

 

 

Personalised Benchmark

 

In the absence of performance benchmarks on a variety of representative hardware configurations, organisations considering adopting Pentaho would be well advised to develop their own. Pentaho may be able to assist:

 

*  Contact us and we will set-up a personalized demo based on your unique use case showcasing our powerful data integration tools and rich analytics.

 

though whether this offer would extend to using “representative” hardware is another matter.