Productivity:   Horizontal   Partitioning     
       
          Working across the stack  

 

 

 

A New Way of Working

 

The standard approach to delivering BI over several decades has failed miserably: BI is very expensive, and application delivery is so protracted that many organisations are forced to adopt alternative tactical reporting solutions to satisfy urgent business requirements.

 

But there is now emerging the glimmer of a new initiative across the industry to try a new, more agile approach to BI development. In part, this new approach depends on the willingness of IT managers to change the way they manage projects and deploy developers, and in part it relies on integrated BI stack tools that can support this new development paradigm.

 

 

BI Stack Component Ghettos

 

The reason why BI delivery is so inefficient is that in most organisations developers work exclusively in BI stack component ghettos; for example, in the case of OBIEE, some developers may work exclusively on reporting, some on the RPD, some on the datamart, and some on ETL.

 

The key issue is that no one developer understands the BI stack end-to-end: how a subset of source system business items are mapped across the stack and end up contributing to the column value in a particular report. This lack of understanding leads to unnecessary communication overheads, and it’s responsible for most of the BI stack bugs, particularly those difficult to detect bugs that are not directly caused by coding errors.

 

 

Horizontal Partitioning

 

There was a good practical reason why component ghettos developed: in the early days of BI, implementing each BI stack component required substantial design skills and a great deal of complex procedural coding.

 

But, for well over a decade now, BI stack construction has become progressively deskilled: all the major BI stack component development tools are now “drag-n-drop” GUI tools (be it A&D, the RPD Administrator, SQL*Developer, or Informatica / ODI). And, equally importantly, most of the “design smarts” that once had to be coded explicitly using a procedural language are now hidden behind the scenes. Today, the BI stack developer just has to drag objects onto a canvas and set their properties.

 

It’s now perfectly reasonable to expect a single developer to work across the entire stack, starting with a set of tables in a transactional source and ending up with a set of reports. Where a project is substantial, it makes sense to partition it horizontally by business area, and to have each developer design and implement all the functionality relevant to a particular business area.

 

 

Pentaho support for Horizontal Partitioning

 

Most current BI stack component tools were not designed with horizontal partitioning in mind: each design tool just delivers a product that is consumed by the next design tool further along the stack.

 

The underlying problem is that BI stack vendors typically create their stacks by acquiring companies that have produced individual BI stack components, and then licence these disparate products under their own brand names. Given the costs of acquisition, it is not commercially viable for a vendor to re-architect the BI stack components so that they work well together.

 

And an individual organisation is likely to source its BI stack components from multiple vendors; for example, the typical OBIEE site will be using Answers & Dashboards and the RPD (built by NQuire in the late 1990s, a product to which only modest changes have been made by its subsequent owners, Siebel and Oracle); BI Publisher (built by Oracle to solve unrelated problems with Oracle Applications reporting, back in 2003); an Oracle RDBMS (build by Oracle from 1978 onwards); and Informatica (standalone ETL built by the Informatica Corporation back in 1993). None of these products were designed to work with one another, and some of them are the result of poor design choices.

 

Pentaho stands out from the other BI stack vendors in that it has build its entire BI stack itself; it may have used open-source software in the build but it is responsible for the overall architecture; and, even relatively recently, it has been prepared to re-architect the stack to produce a much more integrated and more modular product (some might say a rather bold choice that has resulted in grumblings from its userbase and has impacted its rating by Gartner, but one that seem likely to pay dividends – both actual and metaphorical – in the longer term).

 

We believe that the strength of Pentaho’s support for horizontal partitioning is its unique selling point (and not its, admittedly, excellent support for Big-Data). Should Pentaho continue in this direction it will make it very difficult for other vendors, proprietary or open-source, to compete on those two key metrics much beloved by IT managers: fast delivery and low cost. Another open-source start-up could well try to emulate Pentaho but it would have a lot of catching up to do. An existing BI stack vendor, like Oracle, could spend a very substantial sum and a number of years substantially re-architecting its existing product, an undertaking that seems very unlikely.