Product   Extensibility        
          When "out-of-the-box" functionality won't cut it!  




Custom Extensions


For larger organisations, one of the key concerns when it comes to product procurement is the danger of becoming “boxed in”: a new business requirement arises and it cannot be implemented using the functionality available in the selected BI suite. For organisations with thousands of users, even minor changes in functionality can often have a significant impact on productivity or competitiveness, and so a facility to customize and extend a BI suite is a very important requirement.


Unfortunately, one the characteristics of the proprietary BI suites offered by the Big-4 is that there are very limited options when it comes to product extensibility. The marketing perspective of these vendors is that their products already include such a wide range of functionality that everything a client might need is available, “out-of-the-box”.


However, these assertions of functional adequacy are far from being true. For example, over the last decade, we’ve found that about 50% of organisations using OBIEE have requirements for functionality that cannot be implemented using the existing framework: features such as the complex conditional display of report columns based on prompt choices or the implementation of QlikView-style displays. At worst, where the BI suite component uses compiled code, no extension is possible. At best, where the BI suite component is browser-based, the only way to implement custom features is to reverse-engineer the HTML DOM and inject custom JavaScript that runs after the web page is rendered. This approach is time consuming and requires a considerable level of skill as the DOM is undocumented, and there are no guarantees that its structure won’t change with the next product release, which may cause those carefully crafted custom extensions to fail.



The ISV Marketplace


Oracle, and the other Big-4 vendors, license products for standalone use by their clients. However, Pentaho and the other open-source vendors fall into a very different category when it comes to product extensibility; for these vendors the ISV marketplace is very important. Many ISVs want to embed analytics – everything from dashboards to ETL – within their applications, as adding this functionality is a key differentiator that offers a competitive advantage:


*  Embedding Analytics for the ISV: Supercharging Applications with BI


particularly when the embedded functionality can be customized to meet specific client needs. Pentaho specializes in this marketplace, and some of its implementations are close to substantial, end-to-end BI stacks:


*  ABN-Amro

*  Halliburton

*  Ruckus Wireless


In the following sections, we’ll look as some specific examples of where Pentaho shines when it comes to extensibility.



Pentaho Reporting


One of the key requirements when it comes to offering embedded analytics is a good set of APIs that allow individual reporting features to be accessed from within a client’s application. For example, one the key issues with OBIEE HTML DOM manipulation is the absence of the predefined IDs that are needed to manipulate individual DOM elements (forcing DOM searches by class to try to identify the element from its attributes). Pentaho takes a very different approach:


*  To allow for customization of the Analyzer report view, “divs” have been added to certain attach points in the application. This allows users who embed Analyzer to have the ability to programmatically customize the application. Each “div” has its own hard-coded ID for retrieval using any means of DOM manipulation.


If extensibility is important to you, then it’s worth getting one of your web developers to scan Pentaho’s API documentation:


*  API Documentation for OEMs


and take a view as to whether it contains the API calls likely to be needed for the extensions you have in mind.


The API contains calls to “get” and “set” existing properties from JavaScript or via URL calls; calls to functions that manipulate reports, and calls to set up listeners for various GUI events; for example:


*  Functions can be called to refresh a report, clear the cache, or save a report.


*  All the standard GUI events can be trapped and custom functionality can be executed on events such as clicks, double clicks, lassos, drags, and drops.


*  Cell values and filters can be read and overwritten, and hyperlinks on specific measures can be redefined dynamically.


For example, a highly interactive OBIEE custom display, such as:


*  Contextual Percentage Difference Report


is implemented by setting up a listener to determine when the mouse moves into a new cell, by reading all pivot table values, and by setting different cell background colours based on the contextual percentage differences between cell values. The implementation at the link above required the reverse-engineering of the undocumented OBIEE HTML DOM. However, with Pentaho, the API calls for each of these key steps are defined, and when implementing the equivalent functionality in Pentaho the time-intensive reverse engineering of the DOM could be avoided, and the custom code would be stable and would continue to function following future Pentaho releases.


While the API documentation is detailed, there is no high-level, overview documentation with fully worked examples. So, even a developer familiar with both website development and the Pentaho “out-of-the-box” functionality would have to proceed to some extend on a trial-and-error basis. While, in general, Pentaho’s documentation is rather variable in quality, this lack of “easy-to-use” API documentation might not be entirely unintentional in this case, as providing consultancy to clients who are building extensions is a useful source of additional revenue.


So, while you might need to “cough up” some money for consultancy, with Pentaho you, at least, can have a high level of confidence that any custom reporting functionality you might wish to add can be implemented for a modest amount of development effort.



Pentaho Dashboards


In 2013, Pentaho acquired a specialist user-interface company, Webdetails. Webdetails creates and maintains a set of auxiliary tools, CTools, that can be used to extend various aspects of Pentaho’s core functionality:


*  Webdetails CTools


This functionality is not enabled by default, and comes with very little by way of documentation (though tutorials can be purchased from Webdetails for a very modest cost).


CTools contains four dashboarding tools that allow a great deal of fine-grained control over dashboard behaviour (using HTML, CSS, AJAX, JavaScript, and jQuery), the sort of functionality that you would expect to find in any web development toolkit. Apart from embedding Pentaho reports and charts, CTools can be used to source dashboard data from the standard Pentaho metadata sources, from databases, XML files, and from Pentaho ETL transformations. Access restrictions are enforced using the standard Pentaho security model. Advanced features include the storage of parameter states between user sessions, allowing users to pick up where they left off when creating a new session (similar to the user-specific functionality found in OBIEE Answers & Dashboards).


CTools functionality allows the creation of very sophisticated dashboards, well beyond anything that would be possible within the OBIEE framework, though at the cost of having developer resource with low-level web-development skills (note that JSP programming skills are not required). Webdetails offers a set of demo dashboards containing a wide range of functionality, “OpenDemos”, that web developers can examine and customise to reduce the learning curve.


For some larger organisations, where a fine degree of control over functionality and presentation is a requirement, then CTools is ideal, and offers the functionality that OBIEE lacks; the following sample dashboards illustrate the very clean layouts that can be produced using CTools:


*  Theme Park

*  Retail. Co.

*  T-Wars

*  Automotive and Co.

*  Number One


These examples illustrate the LOV style prompts, the highlighting of information on data-point mouse-over, microcharts, and the conditional update of one chart when data is selected in another that we’re familiar with from OBIEE. However, the layout and presentation is far superior to that offered by OBIEE – the quality is more executive board than factory floor.


For the wide range of chart components available in the CTools Charting Library, CCC, component see:


*  Charting Library


Note: click in the LOV towards the top of the page to display the chart variants available within the 13 different categories. Mouse-over the charts to see the pop-up displays of chart data.


The library uses a modified version of Protovis, which is written in JavaScript, and doesn’t require the installlation of a browser plugin.


Another CTools extension, CST, allow a different set of dashboard tabs to be loaded at startup for each user or user role.



Pentaho ETL


As with reporting, Pentaho’s ETL is modular in structure, which makes it very easy to extend the “out-of-the-box” functionality.




The great advantage of Pentaho ETL (PDI) is that it is comparatively easy to create custom plugins. To assess the effort involved, get one of your Java developers to review the documentation at:


*  Sample Step Plugin


This example appends a fixed string to the incoming row, and can usefully be used as a template for simple custom transformation steps. Note that the instructions given for downloading the package are unclear and appear to refer you to the Pentaho Support Portal; however, the Java package can be obtained from Github:


*  pdi-sdk-plugins


For convenience, we’ve included the Java code in the appendix. Note that the custom functionality is implemented using a single line:


*  Object[] outputRow =
      RowDataUtil.addValueData (
            r, data.outputRowMeta.size() - 1, "Hello World!"


with the rest of the code being boilerplate.


Custom Steps


However, in keeping with Pentaho’s philosophy of agile development, to add custom functionality it’s not even necessary to create and deploy a plugin. A far simpler approach is to use the ETL design tool, Spoon, to directly embed custom Java code using a “User Defined Java Class” step (most of the bolierplate, such as the adding imports, is done automatically, allowing the developer to focus on the custom functionality). The Java code added to the step is compiled automatically at runtime using Janino project libraries.


Note that other custom coding can be created within Spoon using the “Modified JavaScript Value” step.


As is always the case with Pentaho, the documentation for extending the ETL is somewhat fragmented and disorganised, without a good, high-level overview. But a Java developer should be able to implement most requirements on a trial and error basis, backed-up by the occasional Google search to find a worked example.



Appendix – Pentaho ETL – Sample Plugin

        /*! ******************************************************************************
         * Pentaho Data Integration
         * Copyright (C) 2002-2013 by Pentaho :
         * Licensed under the Apache License, Version 2.0 (the "License");
         * you may not use this file except in compliance with
         * the License. You may obtain a copy of the License at
         * Unless required by applicable law or agreed to in writing, software
         * distributed under the License is distributed on an "AS IS" BASIS,
         * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
         * See the License for the specific language governing permissions and
         * limitations under the License.
         package org.pentaho.di.sdk.samples.steps.demo;
         import org.pentaho.di.core.exception.KettleException;
         import org.pentaho.di.core.row.RowDataUtil;
         import org.pentaho.di.core.row.RowMetaInterface;
         import org.pentaho.di.trans.Trans;
         import org.pentaho.di.trans.TransMeta;
         import org.pentaho.di.trans.step.BaseStep;
         import org.pentaho.di.trans.step.StepDataInterface;
         import org.pentaho.di.trans.step.StepInterface;
         import org.pentaho.di.trans.step.StepMeta;
         import org.pentaho.di.trans.step.StepMetaInterface;
          * This class is part of the demo step plug-in implementation.
          * It demonstrates the basics of developing a plug-in step for PDI. 
          * The demo step adds a new string field to the row stream and sets its
          * value to "Hello World!". The user may select the name of the new field.
          * This class is the implementation of StepInterface.
          * Classes implementing this interface need to:
          * - initialize the step
          * - execute the row processing logic
          * - dispose of the step 
          * Please do not create any local fields in a StepInterface class. Store any
          * information related to the processing logic in the supplied step data interface
          * instead.  
         public class DemoStep extends BaseStep implements StepInterface {
         	 * The constructor should simply pass on its arguments to the parent class.
         	 * @param s 				step description
         	 * @param stepDataInterface	step data class
         	 * @param c					step copy
         	 * @param t					transformation description
         	 * @param dis				transformation executing
         	public DemoStep(StepMeta s, StepDataInterface stepDataInterface, int c, TransMeta t, Trans dis) {
         		super(s, stepDataInterface, c, t, dis);
         	 * This method is called by PDI during transformation startup. 
         	 * It should initialize required for step execution. 
         	 * The meta and data implementations passed in can safely be cast
         	 * to the step's respective implementations. 
         	 * It is mandatory that super.init() is called to ensure correct behavior.
         	 * Typical tasks executed here are establishing the connection to a database,
         	 * as wall as obtaining resources, like file handles.
         	 * @param smi 	step meta interface implementation, containing the step settings
         	 * @param sdi	step data interface implementation, used to store runtime information
         	 * @return true if initialization completed successfully, false if there was an error preventing the step from working. 
         	public boolean init(StepMetaInterface smi, StepDataInterface sdi) {
         		// Casting to step-specific implementation classes is safe
         		DemoStepMeta meta = (DemoStepMeta) smi;
         		DemoStepData data = (DemoStepData) sdi;
         		return super.init(meta, data);
         	 * Once the transformation starts executing, the processRow() method is called repeatedly
         	 * by PDI for as long as it returns true. To indicate that a step has finished processing rows
         	 * this method must call setOutputDone() and return false;
         	 * Steps which process incoming rows typically call getRow() to read a single row from the
         	 * input stream, change or add row content, call putRow() to pass the changed row on 
         	 * and return true. If getRow() returns null, no more rows are expected to come in, 
         	 * and the processRow() implementation calls setOutputDone() and returns false to
         	 * indicate that it is done too.
         	 * Steps which generate rows typically construct a new row Object[] using a call to
         	 * RowDataUtil.allocateRowData(numberOfFields), add row content, and call putRow() to
         	 * pass the new row on. Above process may happen in a loop to generate multiple rows,
         	 * at the end of which processRow() would call setOutputDone() and return false;
         	 * @param smi the step meta interface containing the step settings
         	 * @param sdi the step data interface that should be used to store
         	 * @return true to indicate that the function should be called again, false if the step is done
         	public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
         		// safely cast the step settings (meta) and runtime info (data) to specific implementations 
         		DemoStepMeta meta = (DemoStepMeta) smi;
         		DemoStepData data = (DemoStepData) sdi;
         		// get incoming row, getRow() potentially blocks waiting for more rows, returns null if no more rows expected
         		Object[] r = getRow(); 
         		// if no more rows are expected, indicate step is finished and processRow() should not be called again
         		if (r == null){
         			return false;
         		// the "first" flag is inherited from the base step implementation
         		// it is used to guard some processing tasks, like figuring out field indexes
         		// in the row structure that only need to be done once
         		if (first) {
         			first = false;
         			// clone the input row structure and place it in our data object
         			data.outputRowMeta = (RowMetaInterface) getInputRowMeta().clone();
         			// use meta.getFields() to change it, so it reflects the output row structure 
         			meta.getFields(data.outputRowMeta, getStepname(), null, null, this, null, null);
         		// safely add the string "Hello World!" at the end of the output row
         		// the row array will be resized if necessary 
         		Object[] outputRow = RowDataUtil.addValueData(r, data.outputRowMeta.size() - 1, "Hello World!");
         		// put the row to the output row stream
         		putRow(data.outputRowMeta, outputRow); 
         		// log progress if it is time to to so
         		if (checkFeedback(getLinesRead())) {
         			logBasic("Linenr " + getLinesRead()); // Some basic logging
         		// indicate that processRow() should be called again
         		return true;
         	 * This method is called by PDI once the step is done processing. 
         	 * The dispose() method is the counterpart to init() and should release any resources
         	 * acquired for step execution like file handles or database connections.
         	 * The meta and data implementations passed in can safely be cast
         	 * to the step's respective implementations. 
         	 * It is mandatory that super.dispose() is called to ensure correct behavior.
         	 * @param smi 	step meta interface implementation, containing the step settings
         	 * @param sdi	step data interface implementation, used to store runtime information
         	public void dispose(StepMetaInterface smi, StepDataInterface sdi) {
         		// Casting to step-specific implementation classes is safe
         		DemoStepMeta meta = (DemoStepMeta) smi;
         		DemoStepData data = (DemoStepData) sdi;
         		super.dispose(meta, data);