Bull has implemented yet another Petascale supercomputer, Helios, this time in Japan. Over and above the technological achievement, this has been a huge challenge for the entire team, as Bull’s Project Director, Philippe Lachamp, explains.
What is the story behind the Helios supercomputer and what will it be used for?
Helios will be contributing to a number of major international research programs aimed at understanding and controlling nuclear fusion as a potential new source of energy. In France, the ITER experimental reactor – currently being built at Cadarache – is already well known. But there are other, related projects, including those launched as part of the ‘Broader Approach’ agreement between Europe and Japan, being implemented on the Japanese land. A specialist international research center, IFERC, has been set up at Rokkasho in Japan, and it is hosting a data center dedicated to nuclear fusion. The Helios supercomputer is at the heart of that center: it will be used by the various research teams to explore the fundamental questions posed by fusion, like the stability of fusion plasmas and the design of materials capable of absorbing the neutron emitted by the fusion reaction under extreme conditions in terms of temperatures and pressure.
So how was Bull chosen?
The ‘Broader Approach’ is an international program that brings together Japan and Europe, represented by F4E (Fusion for Energy). F4E turned to the CEA – the French Alternative Energies and Atomic Energy Commission – a world-renowned player in this field – to run an international tendering process and choose the best possible supercomputer, as well as the associated computing environment and services (maintenance and operation for a five-year period). Given what is at stake, the process was extremely strict and rigorous. Bull won the contract, in particular, because of the guarantees we could give that we would be ready for the fixed deadline, in January 2012. The contract was signed in March 2011, and only Bull had the necessary expertise and technical resources to deliver a fully operational Petascale machine in just nine months.
What was the scope of your involvement?
It’s very simple: we were responsible, from A to Z, for implementing a 1.5 Petaflops supercomputer at a site that, in the beginning, only consisted of four walls and an electricity and water supply! So we had to start by guiding a local subcontractor to install the technical infrastructure (electrical network, air-conditioning, UPSs…). Then we had to implement the whole system: in other words, not only the supercomputer itself, with its 2,205 bullx B510 blades equipped with Intel Sandy Bridge® processors, but also all the hardware and software infrastructure needed to use it: file management, scheduling and execution of processing tasks using bullx supercomputer suite, administration, storage, archiving, networks, security, the user portal, visualization tools… Finally, because Bull is also responsible for operation, maintenance and support for the installation over a five-year period, we needed to put in place the essential processes and resources to meet users’ legitimate demands from a piece of hardware like this. Overall, you need to think of it as an end-to-end infrastructure project, where every component is right at the cutting edge in terms of its technical capabilities, performance and robustness.
So how did the project go?
The project team consisted of around a dozen people, backed up by all Bull’s many expert teams, especially the manufacturing capabilities at our factory in Angers. Bull doesn’t have an operation in Japan, so some of the team were dispatched over there, while the rest took their turn to work on the project from their base in France. The time-difference actually proved to be quite an asset, as it meant we could work round the clock in the last couple of months of the implementation.
We also had to tackle the issues of governance, organization and mutual understanding that characterize this kind of international, multi-cultural project. And there were some other surprises too: like when we discovered that in November all the freight traffic between France and Japan was booked up to transport Beaujolais Nouveau! Despite all this, the first processing node was switched on, on 20 October. Two months later, there were 4,410 up and running. Helios was officially accepted, as planned, in December 2011, having clearly demonstrated that we met our commitments with a test running three processing codes on 65,536 cores. Helios went into production on 12 January and will initially be fine tuned by internationally renowned research teams before going into full production in April 2012.
As Helios is now officially inaugurated, what are your strongest impressions as you look back on this adventure?
That it really was just that: an adventure! And that we made a success of it. We’re immensely proud to have built this system – one of the world’s most powerful supercomputers – in such a short time, and under such unusual circumstances. Without doubt, the key to this was that it involved an incredibly solid team, with impeccable motivation: everyone stayed totally focused on the outcome and, throughout the nine months, gave their very best to achieve it. When we were finally able to celebrate our success, one of the members of the team used this quote from Mark Twain to sum up what we’d achieved: “They did not know it was impossible, so they did it!”