by Xavier Vigouroux and Gunter Roeth, Bull
Drastic falls in the cost of DNA sequencing have opened up fascinating new prospects for genomics and biochemistry, but they have also exposed the need for much greater processing power and storage. To implement these leading edge systems, researchers need to deal with people who not only understand the technological, but also the scientific aspects of their work; people like those in Bull’s specialist Applications and Performance team.
When the sequence of the human genome was first unraveled in the early years of this century, the health and life sciences sector experienced a big upsurge in popularity. Both in higher education and industrial labs, research increased dramatically. Many significant discoveries were made (such as microRNAs) and new methods have been developed (so-called ‘next-generation sequencing’). One of the most spectacular result has been drastic reduction in the cost of sequencing an individual’s genome. Something that was a unique and major advance just a decade ago, can now we carried out on demand, for under a thousand dollars. But far from being an end in itself, this democratization of sequencing increasingly seems to be just a starting point. Effectively, research is advancing apace and we can increasingly measure the extraordinary complexity of the mechanisms of life. Nowadays we can access the genome profiles of huge populations for a relatively modest cost, enabling the kinds of large-scale analyses to be carried out that are the only way for us to really understand the role and behaviour of genes. At the condition, however, you have access to adequate computing resources…
Leading-edge, but varied requirements
Handling massive amounts of highly specific data and algorithms, as well as often highly sensitive personal information, genomics and biochemistry require high-performance supercomputers that are precisely matched to the tasks they have to perform. In genomics, for example, many research programs are interested in comparing two populations – one of them with a particular illness or disability and the other one healthy – to compare their genomes and attempts to isolate the differentiating factors, all other this being equal. This means that initially, using the latest ultra high-speed methods, the genetics ‘puzzle’ for a large number of individuals has to be reconstituted (which massively increases the amount of information to be processed). Following this, these huge datasets have to be processed using specialist analytical algorithms. Only the very latest generation of supercomputers, combined with huge memory capacity, can perform these calculations in acceptable timescales.
This need for advanced computer simulation applies equally to biochemistry where the behaviour of proteins is being studied. It has been shown that environmental chemistry, the conformation of molecules and the countless interactions that arise from this directly influence the express of particular genes. Modeling these molecules, discovering the rules that determine their composition and evaluating the interplay of the web of factors influencing them… these are the key aims of this highly complex area of science, whose progress depends directly on the available computing capabilities.
In industry, the challenge is somewhat different. A great deal of academic research often leads off into the unknown, so it has to cover the entire spectrum of possibilities, while the work of many industrial laboratories starts with a clear idea of their target. This means they have lower requirements for computing capacity, so they can use so-called ‘DNA chips’ which only examine limited and clearly defined areas of the genome.
On the other hand, the requirement to carefully preserve the history of their research (protocols, systems, results…) demanded in particular by the US FDA, requires highly advanced probative value archiving solutions. And security (especially user identification, access control and authentication) is a major, omnipresent issue.
Researchers talking to researchers
As we have seen, life sciences research is inextricably linked to the computer systems it uses. The quality of the results obtained is directly dependent on the capacity, performance, robustness and security of these systems. But it is precisely because these links are so tight that researchers must have access to people who are not only technical experts but also really understand their scientific area. Only someone who is up-to-speed with their work and the software they are using, their vocabulary and how they normally present their results, will be capable of perfectly translating their needs in terms of technical features. This is why Bull has set up a specialist center as part of its Extreme Computing business, known as Bull Applications and Performance (A&P), consisting of High-Performance Computing (HPC) experts who all also have PhDs in specific scientific areas. As a result, researchers will be talking directly to researchers. This dual culture means they can perfectly understand the requirements expressed by users and can size, adjust and optimize the entire technical environment of bullx™ supercomputers to get the very best out of scientific applications. This detailed optimization work can make a huge difference; indeed, in such a tough, competitive field, it can make all the difference!
What our customers say…
“There is no doubt that the heart of this partnership with Bull lies in the added value continually demonstrated by their team in the area of application development and performance tuning. They have far exceeded expectations throughout – from the start of service with their initial benchmarking of key Cardiff codes and the associated “fast-start” programme, to trouble-shooting problem applications during the lifetime of the service. Bull experts have provided a level of pro-active support not matched in our experience by any of their competitors, support that has included the secondment of key staff to the Cardiff site. This has been critical in terms of our being able to support the diverse community of users of the ARCCA services, from experienced practitioners running applications which scale over 100’s cores to those just starting to consider the impact of computing on their research aspirations. The impact of this support on the ARCCA service was demonstrated in the competitive procurement for the new Cardiff supercomputer in 2012. Bull proved yet again to be our supplier of choice with the replacement bullx B500 Sandy Bridge based blade system scheduled to commence service in November 2012.“
Martyn Guest, Director of ARCCA, Cardiff University (Advanced Research Computing @ Cardiff)
“Our collaboration with Bull, set up as part of a research project that began in February 2012, was immediately a very active one. As soon as the CURIE supercomputer went into service, the dual computing and scientific skills of Bull’s experts enabled us to work together very naturally, around a shared culture, and to get up and running straight away. These experts have in-depth knowledge of our codes, and they are used to our working methods and the specific constraints of our computations, which means they can target any problems directly and rapidly find solutions that really meet our needs. The responsiveness of Bull’s experts and the relevance of their work were key determining factors in the advance of our research programs and we are looking forwards to a very productive long-term collaboration.”
Sébastien Masson, researcher at the LOCEAN-IPSL lab, specializing in climatic modeling and especially the interactions between the atmosphere and the ocean.
- An expert for every key HPC domain
Bull has true specialists for each of the areas where HPC is used, who are capable of understanding every aspect of the relevant field.
All simulation pictures are the property of their respective
Xavier Vigouroux is Business Development Manager for Bull’s Extreme Computing business, for the Education and Research sector. He joined Bull in 2006 to lead the benchmarking team, now an integral part of the Applications and Performance HPC center. He previously worked for HP, Sun Microsystems and Sun Labs. Xavier is a graduate of ENS Lyon, where he was awarded a PhD in parallel computing, in 1996.
Gunter Roeth joined Bull’s Applications and Performance HPC Center in 2010, having previously worked at Cray, HP, and most recently Sun Microsystems. He has a Master in geophysics from the Institut de Physique du Globe (IPG) in Paris and has completed a seismology thesis on the use of neural networks (artificial intelligence) for interpreting geophysical data. In 2000, he was one of the founder members of a group of life-sciences experts.