Gérard Haudiquert, Fast Track and Big Data Expert Architect
No-one can argue that Big Data has amazing potential. But because it involves a fundamentally statistical approach it will not replace other forms of Business Intelligence (BI) like reporting and multi-dimensional analysis. More than ever, when it comes to BI it is business objectives that have to guide technological decisions.
The growing use of Hadoop by the Internet giants, the drastic fall in the cost of data storage and the intuitive idea that usage data can reveal valuable insights have all encourage a great deal of interest in Big Data. But despite its undeniable potential, Big Data and its associated technologies cannot, under any circumstances, take the place of well-established Business Intelligence approaches.
There are three main forms of Business Intelligence: reporting, multi-dimensional analysis and statistics
Big Data indisputably belongs to the third category. Compared with earlier forms of statistics and data mining, what’s different about Big Data is that its algorithms take into account much wider types of data, from various different sources, than traditional ‘snapshots’. On the other hand, it provides a very broad, ephemeral and very mathematical view, aimed at that rare group of highly experienced users and which (because they are too imprecise to be useful for operational management) can only hint at major trends. The information that Big Data can provide about customer taste, opinions or behaviour, or about market trends or competitors’ initiatives, are certainly fundamental, but they are no substitute for management figures. Targeted Web advertising based on this type of approach, for example, is fairly consistent and definitely useful, but clearly too imprecise to be the sole basis for a marketing strategy. Reporting and multi-dimensional analysis are still vital to successfully managing your business.
Driven by business objectives
Big Data, reporting, multi-dimensional analysis… As always, when it comes to BI the challenge is above all about the business: what do we want to know, why and how are we going to use that information? Only when you have answered these questions can you focus on which form of Business Intelligence will deliver the best response. If this is Big Data, you then need to determine which algorithms you are going to use to extract and transform the mass of raw data into useable information. If it’s reporting, and above all multi-dimensional analysis, you need to put yourself into a position where you can take informed and relevant decisions. In other words, having your own data at your fingertips, designed especially for that purpose and capable of delivering explicit value to the business. While a transaction processing system, for example, can still function with missing or factitious data, that’s not true of a BI system. That’s the difference which explains the need to model, extract, clean and transform production data to feed BI systems. This preparation precedes and justifies the continuation of architectures based on a data warehouse to store this formatted data, as well as specific data marts based on the queries that will be used to turn this data into useful information. For reporting and multi-dimensional analysis, to be very clear, the use of SQL and a data warehouse is still essential (when we say ‘noSQL’, we actually mean ‘not only SQL’ and not ‘no more SQL’!).
The SMP architecture revolution
Nevertheless, despite this stipulation, these two traditional areas of BI are also undergoing their own revolution. In the past, only MPP architectures were capable of dealing with the sheer size of a data warehouse. Now, with the drastic cuts in the cost of storage, SMP architectures (when CPUs and I/O are carefully balanced) enable many tens of Terabytes of data to be managed, with a very large number of simultaneous users. It’s a radical change, but even here it is business demands that need to prevail. At the point where the architecture is being defined, four criteria need to be examined: how much data is involved, the number of simultaneous users, how sophisticated the queries will be, and the complexity of the data model. If one of these parameters is very big, then an MPP rather than a SMP architecture could be envisaged. Nevertheless, today it is quite possible to set up SMP architectures as a grid, with an optimizer at the RDBMS level to give you a single view of the whole (for example, using Microsoft Parallel Data Warehouse). That way, you can manage up to 800 Terabytes of data using an SMP set-up with a much lower TCO than MPP, which often involves tricky and expensive upgrades.
As the architect of secure, complex decision-support solutions, Bull offers the full range of architecture and can therefore offer a totally impartial approach to solution design. Our only concern is to address the business need that has been initially identified, whether that involves Big Data, reporting or multi-dimensional analysis. The Bull FastTrack appliance enables bespoke configuration of SMP architectures of between 12 and 80 cores – with disk bays in a local SAN, in the same rack, so as not to overload the network – and the option of embedding existing tools chosen by the customer (BO, SAS, Informatica…). As a turnkey solution, FastTrack can manage up to 80 Terabytes of data (before any subsequent extension using Parallel Data Warehouse). Finally, Bull is committed to performance, to ensure the service levels delivered by the appliance are maintained.
More information >>> http://www.bull.com/bi/