Everything Oracle | Home | Everything Oracle |
Data | Source | Integration | |||||
Managing Complex Data Source Integration |
Introduction |
One of the main advantages that OBIEE offers over its predecessors is its ability to seamlessly integrate complex data sets. In this article we’ll examine OBIEE’s data integration capabilities. Rather than discussing these in abstract terms, we’ll focus on a number of particular scenarios, so that you can see more easily if there is a fit with a particular requirement that you may have to hand.
But one important caveat: just because it’s possible for OBIEE to integrate together complex data sets does not mean that using OBIEE to do this integration is the best architectural option. If you have large data volumes and tight performance KPIs, then what you need, if at all possible, is a well constructed data warehouse that presents its data to OBIEE in the form of a simple star schema.
However, in the real world, optimizing the data source may not be achievable or worthwhile for a number of reasons:
Cost
Timescales
Temporary/Unstable Reporting Requirements
Small Data Volumes
Real Time BI
Cost
Developing an optimized data warehouse may not be possible within the constraints of your project budget. The reality may well be that you’re stuck with a disparate set of data sources for reporting purposes, and that’s it. Here OBIEE’s data integration capabilities can offer a solution, though report performance may well fall far short of what you’d like to achieve.
Timescales
Developing an optimized data warehouse may be within your budget, but you may also need a reporting solution right now. Perhaps you just can’t wait for a year or more while a data warehouse solution is put in place? OBIEE can offer an interim solution. You can quickly integrate together the data in its current configuration to provide a temporary fix. Then you can change the data source to your data warehouse when it comes on line.
Your question: “Will I end up paying to do the same work twice over?” The good news is that the modular structure of OBIEE separates the construction of reports from the mapping of data to support these reports. When you move to your new data warehouse, your reports won’t have to be reworked as the presentation layer in the OBIEE repository will remain unchanged. All you’ll have to do will be to rework the physical and business mapping layers within the OBIEE repository. And as your data warehouse will now be doing all the work of data consolidation by means of ETL loads, the rework will only involve a simplification of the OBIEE repository, which won’t cost you an “arm and a leg”. So, yes, there will be an additional cost, but if you keep the interim solution small scale – to satisfy only pressing reporting requirements – then using OBIEE’s data integration capabilities may offer an excellent solution in the short term.
Temporary/Unstable Reporting Requirements
Many organizations have temporary reporting requirements – reports based on externally supplied data to evaluate a merger or takeover opportunity, for example.
Many organizations have unstable reporting requirements. End users may not be sure exactly what they want in terms of reporting when a new business area opens up. There’s a danger that you may end up rejigging your data warehouse at considerable cost only to find that a reporting requirement changes significantly or even disappears.
Using OBIEE to integrate the additional data sources in these circumstances may offer a faster and cheaper solution.
Small Data Volumes
If data volumes are small – a few hundred lines in an Excel spreadsheet, for example – then integrating the data into your main database is unlikely to have a significant impact on performance and may well not be worth the effort. Using OBIEE to integrate the data may be your best bet, though the downside will be a more complex OBIEE repository.
Real Time BI
It’s quite possible that you have a near real-time reporting requirement with data drawn from OLTP databases (a data warehouse will its unwieldy ETL processes may not be an option). In these circumstances, OBIEE can connect directly to and integrate together the OLTP data sources, allowing BI end users to see data as soon as it is generated.
Row Partitioning (Vertical Fragmentation) |
Global BI Hub
Let’s suppose that your organization has a number of subsidiaries, each of which is responsible for managing sales within its own region – US, Europe, Middle East and Africa, Far East, and South America, for example. The subsidiaries, brought together as a result of acquisitions, have their own regional data warehouses, and the data warehouses have been sourced from different vendors.
You’re responsible for developing a global BI hub and you want senior executives to be able to generate reports that span different sales regions. In this scenario, you have multiple sales tables containing a common subset of columns with the same semantic content, but each sales table contains a set of rows associated with its own region. OBIEE can generate reports that span regions by sending subqueries to each database whose rowset is involved in a request (the fragmentation content definitions within the OBIEE repository allow the OBIEE BI Server to send out just those subqueries that are needed to satisfy the request).
In these circumstances, where reports are summarizing data that has already been summarized, the numbers of rows returned from the data warehouses are likely to be modest, so that desired performance KPIs can probably be met (though the average number of rows returned per subquery relative to the transmission line bandwidth is something you’d need to investigate as part of a feasibility study).
Current and History Tables
A very common scenario, if you’re sourcing data from an OLTP database, is one in which a large database table – corresponding to an entity at the bottom of the ER diagram – is split into a current table and a historical or archive table – “sales” and “sales_archive”, for example. Doing so increases performance as OLTP transactions can target the more modestly sized current table, while the archived table is available for reporting, and to resurrect transactions in extremis.
You may well have a BI reporting requirement in which end users want to see a single sales table as far as reporting is concerned. Here again OBIEE can direct subqueries to the current or archive table or to both, as appropriate. A complication in this case is that the contents of the two tables are not static but change over time as new rows are added to the current table and old ones are migrated to the archive table. OBIEE is able to integrate tables with dynamically changing content, either by working from a migration schedule or by dynamically inspecting the tables’ contents at intervals to determine where the current timeline partition lies (an initialization block in OBIEE resets a dynamic repository variable on an appropriate schedule and uses its value to set the fragmentation content of the current and archive tables).
Column Partitioning (Horizontal Fragmentation) |
Let’s suppose you have a “suppliers” table that is stored in an Oracle database. One of your supply chain managers has extracted a subset of the “suppliers” table columns, say the set of suppliers identifiers and names, to an Excel spreadsheet, and has then added a dozen columns – characterising the suppliers according to a variety of metrics – to the data in the spreadsheet. For reporting purposes the data in the database is needed as the “suppliers” table joins to others containing, say, order fulfilment details, for example. But the data in the Excel spreadsheet is also needed for report construction.
Here, each row in the logical “suppliers” table has been split into two components by column, one component is stored in the database and the other in the Excel spreadsheet. You could, of course, merge the Excel data into the “suppliers” table in the database, but this might not be worthwhile if the reporting requirement is a temporary one, or it might not be practical if the Excel spreadsheet is updated at intervals from some other software application that, say, gathers statistics from an external source.
Here again OBIEE can integrate the two data sources together. If a query does not involve columns from the Excel spreadsheet then only the database will be queried, and vice versa. If a query contains columns from both data sources then subqueries will be sent to the data sources and the returned rowsets will be stitched together in a manner transparent to the end user or the requesting client application.
Summary Tables and Aggregation |
If end user queries need to process more than a small volume of data to return a result, then you’ll need to make use of summary or aggregate tables in order to achieve an acceptable level of performance.
By far the best approach to implementing summary tables is to do so in a manner that is transparent to OBIEE by using a query rewrite facility that is built into the data source (for example, Oracle has supported query rewrite using materialized views and materialized view logs since version 9i of the RDBMS).
However, some data sources may not support query rewrite, but the data source may still have tables that summarize the raw data to varying degrees. In these circumstances, OBIEE can dynamically determine for each request the most performance efficient query to send to the data source. Within the OBIEE repository the aggregation content of summary tables (fact and dimension) is specified. If the attributes contained in a query lie at or above the aggregation level of a summary table, then the summary table will be used; if not, then the underlying fact table will be used instead. The result is improved performance without sacrificing flexibility (this approach does, however, add to the complexity of the OBIEE repository).
Snowflake Schemas |
Ideally, the tables in your data source would be arranged in a denormalized, star schema (a single fact table with foreign-key links to direct masters). However, in the case of an OLTP data source your table schema is likely to be highly normalized – a snowflake schema – in order to optimize transaction processing. Both fact tables and dimensions may be normalized. For example, a “customers” dimension table may have a “countries” dimension table as its direct master; and a “sales” fact table may have a “costs” fact table, containing additional measures, as its direct master.
OBIEE can map snowflake schemas, at both the fact and dimension levels, so that, in the case of the example, you would have a single logical fact table “sales” with a single logical dimension table “customers” (the “customers” and “countries” attributes are merged into a single “customers” hierarchy). When OBIEE sends SQL to the backend database to satisfy a request it will join together any snowflaked tables that contain the attributes needed to satisfy the query.
24/7/365 Operation |
Let’s suppose you operate a distributed BI hub with queries coming in from end users scattered across the globe. You need to offer a service all day, every day. However, the ETL operations needed to keep your data warehouse up to date take over an hour to complete. And you also need to bring your database servers down on occasion for routine maintenance and for upgrades. To meet the required service level you use the data warehouse to feed two datamarts, each running on a different machine. While reporting is taking place against one datamart, the other is being populated or maintained. At soon as the ETL processes have finished populating one datamart you want to automatically switch all reporting to that datamart in a manner that is transparent to end users.
OBIEE can handle this type of dynamic query redirection – either between different servers or between different tables within a particular server – to meet high level availability requirements such as this (the name of the active data source is held in a small non-cacheable table in each data source and is used to refresh a dynamic repository variable at intervals. The value of the repository variable is then used to set the connection pool or the physical table name that is used by queries). With this approach it’s possible to reduce the latency for viewing refreshed data at a switchover to a few minutes.
Summary |
The scenarios in the above sections give you some idea of the flexibility of OBIEE when it comes to complex data source integration. These different scenarios can be mixed and matched as required: for example, it would be possible, though improbable, to have a set of fact tables and associated aggregate tables, each of which is partitioned both horizontally and vertically into a number of different fragments.
However, to return to the key point we made in the introduction: useful as OBIEE’s data integration facilities can be, they should always be relegated to your “Plan B”. As far as possible, integrate data within the data source, or data sources, used by OBIEE. You’ll get better performance, you’ll reduce the coupling between system components, and you’ll get a simpler and much more easily maintained OBIEE repository.
Everything Oracle | Home | Everything Oracle |
Copyright © 2007-2015 PWG Consulting, All Rights Reserved