"This will probably be the largest AI-dedicated supercomputer in France"

Corporate

The development of AI and progress in components have shifted supercomputers towards graphics processors or GPUs. A new investment has been made to adapt the imposing Jean Zay supercomputer run by the GENCI and operated by the CNRS to the future requirements of research and innovation.

Since the Jean Zay supercomputer's inauguration in 2019 it has been a pioneer in the field of research supercomputing in many ways. The machine weighing forty tonnes or so is housed and operated by the IDRIS1  and was briefly in 'dry dock' for maintenance in February 5th before returning to simulation work. This maintenance is linked to a project to increase its computing power. Jean Zay has already undergone several modifications. The last took place in June 2022 and upgraded it to a capacity of 36.85 million billion floating point operations per second (Pflops). This capacity means it can respond more effectively to the computing requirements of a great number of research projects.

"The State has contributed €40 million of funding for a new extension to Jean Zay through the General Secretariat for Investment (SGPI) and in the framework of the France 2030 programme," explains Adeline Nazarenko, director of CNRS Informatics. "This effectively doubles the total investment already made in the supercomputer. It is a priority for the French government to increase its computing power and this benefits the whole research ecosystem. The CNRS is working hard to achieve this feat of technical prowess in response to the demand for computing power within the very tight schedule the French government has set".

A question of processors

There are two main types of processor in both desktop computers and supercomputers - central processing unit (CPUs) capable of performing a wide range of tasks, and graphics processing units (GPUs). Despite what then name may suggest GPUs are no longer limited to visual tasks and are instead increasingly used to accelerate calculations on large datasets.

ff
© Rafael MEDEIROS / IDRIS / CNRS Images

Jean Zay was previously configured in four parts. The first was made up of all its 60,000 CPU cores while the three others corresponded to 3000 units of different models of GPUs. The shutdown lasted less than 24 hours and enabled 1 600 CPUs and 1000 of the oldest GPUs to be removed to prepare for more higher performance GPUs to be added in the summer.

"Our thermal and energy envelope does not enable us to conserve everything," explains Denis Veynante, the deputy director of the CNRS Open Research Data Department (DDOR) and CNRS research professor at the EM2C2  laboratory. "This operation only reduced the computing capacity of GENCI's CPUs by 5%. The other national supercomputers like the one at the National Computing Centre for Higher Education (CINES) in Montpellier or the CEA Very Large Computing Centre (TGCC) in Essonne have sufficient computing resources available to make up for this shortfall".

CPUs have been the benchmark for decades in High-Performance Computing (HPC) particularly for simulations in materials science, climatology, chemistry, biomedicine and so forth. Moore's Law predicts the doubling of processor power every two years but heat dissipation issues have meant this hit a ceiling for CPUs. Their most powerful components get so hot that it becomes difficult to maintain them at an acceptable operating temperature. The energy limitation issues our societies are facing could even lead to slower investment in the development of new CPUs in the long term.

"GPUs are subject to the very same energy constraints but, as they are based on different architecture choices, their power is still growing, particularly for AI-related processing," explains Michaël Krajecki, deputy scientific director of CNRS Informatics, professor at the University of Reims Champagne-Ardenne and a member of the LICIIS3  laboratory. He goes on to point out that ''GPUs are now capable of rapidly processing complex operations like convolution algorithms applied to matrices. This is an essential operation for many AIs".

Enhanced integration of AI possibilities

The extremely rapid development of AI (including generative AI) has also impacted the research world. These revolutionary systems consume immense amounts of GPU resources and require computing power previously often reserved for the GAFAMs4 . It is therefore a major challenge for national and European sovereignty to train models of this kind on a French state-run supercomputer.

The installation of new GPU architectures on Jean Zay is planned for the summer. As the call for tenders has only just closed this means the machine's final power remains unknown for the time being. However Michaël Krajecki stresses that it "will probably be the largest AI-dedicated supercomputer in France".

hh
© Cyril FRESILLON / IDRIS / CNRS Images

"The need for additional resources on Jean Zay is growing exponentially," he goes on. "During our last allocation campaign for computing slots we only had one hour available for four requested. Also the State wants to open up strategic access to Jean Zay for the innovation sphere. The aim here is notably to build up large data corpora to train generative AI".

The growing role played by innovation

"The national community is making significant efforts for this extension," insists Adeline Nazarenko. "It will be possible for half the computing time to be reserved for innovation with the other half allocated to the academic sphere. Also, it's important to remember that researchers and academics also actively contribute to a number of innovation projects."

Partners from industry are already using Jean Zay but they need to work with an open science approach to be eligible. For example, they have to commit to publishing their results or making their algorithms or of data corpora available for the research community. Opening up to innovation broadens the spectrum of candidates for slots. Many start-ups deriving from CNRS laboratories are among the applicants.

An inevitable transition

The increase in the proportion of GPUs in the architecture of Jean Zay is a reflection of an essential change to the codes as those written to function with CPUs need to be adapted to GPUs. Certain models have been used for decades but now need to be reworked. The issue here is that the researchers who use these codes generally tend not to have the high level of programming skills required to do this themselves. Large-scale computing centres like the IDRIS have engineers who provide a high-quality technical environment and can thus offer support for their users but additional support will certainly be required.

"For example, Jean Zay is used by theoretical chemistry researchers who work on subjects linked to the structure of molecules," explains Jacques Maddaluno, director of CNRS Chemistry. "They often use the professional software Gaussian for this and its first version dates right back to 1970. It includes hundreds of thousands of lines of code that have been optimised for CPUs. The shift towards GPUs is a historical turning point in history and I can't see how we can escape this move."

Other studies involve calculating reaction mechanism, the aim of which is to understand the role of the intermediate molecules that form during a chemical reaction but which cannot be observed directly because they are too fleeting in nature. AI is also increasingly used in materials chemistry where it has the capacity to provide solutions based on pre-identified molecules that are as close as possible to the required properties. Overall then, demand for GPUs is growing.

Exascale on the horizon

"If laboratories couldn't access to supercomputers like Jean Zay they would have to purchase their own machines or work with computing centres in other countries where data security may not be guaranteed," warns Jacques Maddaluno. "However, changes to favour GPUs meet a natural form of resistance because researchers do not necessarily want to have to devote time and energy to rewriting computer codes. And yet, this is necessary to respond to the demand for computing power."

The research world will also need to adapt to exascale supercomputers with the capacity to perform over a billion billion floating-point operations per second. Exascale computing in Europe will first arrive in Germany in 2025 to be followed by the inauguration of the Jules Verne supercomputer at the TGCC in 2026. The scientific community is preparing for this - the NumPEx PEPR6  co-directed by Michaël Krajecki aims to design and develop the software building blocks for this technology which is also based on a large number of GPUs.

The maintenance of Jean Zay and exascale computing alike aims to enable researchers to benefit from increased power to dovetail with current technological developments and research requirements. In both cases, the CNRS has already launched programmes to support researchers and adapt computer codes for this major shift towards GPUs.

  • 1Institute for Development and Resources in Intensive Scientific Computing (CNRS).
  • 2Macroscopic Molecular Energy and Combustion Laboratory (CNRS).
  • 3Laboratoire d’informatique en calcul intensif et image pour la simulation (URCA/CEA)
  • 4Google (Alphabet), Amazon, Facebook (Meta), Apple, Microsoft.
  • 6Digital for exascale exploratory PEPR (Priority Research Programme and Equipment).