Presentation summaries

HP01 | Leveraging GPUs for matrix-free optimization with PyLops

The use of Graphics Processing Units (GPUs) for scientific computing has become mainstream in the last decade. Applications ranging from deep learning to seismic modelling have benefitted from the increase in computational efficiency compared to their equivalent CPU-based implementations. Since many inverse problems in geophysics relies on similar core computations – e.g. dense linear algebra operations, convolutions, FFTs – it is reasonable to expect similar performance gains if GPUs are also leveraged in this context. In this paper we discuss how we have been able to take PyLops, a Python library for matrix-free linear algebra and optimization originally developed for singe-node CPUs, and create a fully compatible GPU backend with the help of CuPy and cuSignal. A benchmark suite of our core operators shows that an average 65x speed-up in computations can be achieved when running computations on a V100 GPU. Moreover, by careful modification of the inner working of the library, end users can obtain such a performance gain at virtually no cost: minimal code changes are required when switching between the CPU and GPU backends, mostly consisting of moving the data vector to the GPU device prior to solving an inverse problem with one of PyLops’ solvers.

Presenter: Matteo Ravasi

Matteo is an Assistant Professor at KAUST University in the Physical Science and Engineering division with a formal education in Telecommunication Engineering from Politecnico di Milano and a Phd in Geophysics from the University of Edinburgh. Previously, Matteo has worked as geophysicist in Equinor in a variety of roles both within research and operations and he has also led the development of several open-source software products in the geophysical sphere.

He has made several contributions in the areas of seismic processing and imaging by developing novel methods aimed at using the full potential of seismic data to improve the quality and resolution of subsurface imaging products. For his Phd work, Matteo is the recipient of the SEG Karcher Award, RAS Keith Runcorn Prize, and Gustavo Sclocchi Theses Award. He is also the inventor of 2 international patents and author of 15 peer reviewed papers.

HP02 | Up-to-date assessment of 3D frequency-domain full waveform inversion based on the sparse multifrontal solver MUMPS

Efficient frequency-domain Full Waveform Inversion (FWI) can be applied on long-offset/wide-azimuth stationary-recording seabed acquisitions carried out with ocean-bottom cables (OBC) and ocean bottom nodes (OBN) since the wide angular illumination provided by these surveys allows for limiting the inversion to a few discrete frequencies. In the frequency domain, the forward problem is a boundary value problem requiring the solution of large and sparse linear systems with multiple right-hand sides. In this study, we revisit the potential of the massively-parallel sparse multifrontal solver MUMPS to perform efficiently the multi-source forward problem of 3D visco-acoustic FWI. The execution time and memory consumption of the solver are further improved by exploiting the low rank properties of the sub-blocks of the dense frontal matrices, the sparsity of the right-hand sides (seismic sources) and the work in progress on the use of mixed precision arithmetic. We revisit a 3D OBC case study from the North Sea in the 3.5~Hz-13~Hz frequency band using between 10 and 70 nodes of the Jean-Zay supercomputer of IDRIS and show that, even without exploiting low rank properties, problems involving 50 millions of unknowns and probably more can be tackled today with this technology.

Presenter: Patrick Amestoy

Patrick R. Amestoy received his Ph.D. in computer science (option scientific computing) from Toulouse INP in 1990. From 1991 to 1992 he was a post-doctorate research fellow at CERFACS (Toulouse, France). Since 1992 he has been working at Toulouse INP and is full professor since 2004. Between 09/99 and 09/2000 he was scientific visitor at the Lawrence Berkeley National Laboratory (Berkeley, USA).

In January 2019, he is one of the creator of the spin-off Mumps Technologies located at Ecole Normale Supérieure of Lyon (ENS-Lyon), dedicated to research and development, service and training around MUMPS software. Since January 2019, his is on a sabbatical leave from the university to work full time in Mumps Technologies. His current research interests are high performance computing and sparse linear algebra (co-author of MUMPS package http://mumps.enseeiht.fr/).

HP03 | Hybridized discretizations for seismic wave simulations

We demonstrate three hybridized discretizations that can be stably applied to the wave propagation problem. By confining the most demanding discretization to small subdomains, these techniques have the potential to significantly reduce the computational resources required to perform the routine tasks in seismic studies.

Presenter: Longfei Gao

Dr Gao is currently a researcher in the Oden Institute at UT Austin.

HP04 | Toward High Performance Asynchronous RTM with Temporal Blocking and Buffered I/O

During the forward and backward modeling in Reverse Time Migration (RTM), stencil computations constitute one of the main computationally intensive components. Their classic implementation based on Spatial Blocking (SB) is subject to performance limitation on modern multicore architectures due to several reasons, including non-uniform memory access, memory bandwidth starvation, load imbalance, and limited data locality. The Multicore Wavefront Diamond-tiling Temporal Blocking technique (MWD-TB) introduced in (Malas, PhD thesis 2015, Malas et al., SIAM SciCo 2015, Malas et al., ACM Trans 2017) aims at reducing the memory bandwidth requirement of stencil computations by increasing cache reuse within successive time steps. The authors in (Akbudak et al. IJHPCA 2020) integrate the MWD-TB technique into the modeling phase and the authors in (Qu et al., KAUST Tech Report 2020) eventually embed it into the full RTM using in-memory I/O operations snapshotting for the imaging condition and illustrate with the Salt3D dataset. In this paper, we further enable Out-Of-Core (OOC) I/O snapshotting operations on the Lustre parallel file system using the buffering strategy from MLBS (Alturkestani et al., EuroPar 2020). We present preliminary results using the Marmoussi 3D dataset.

Presenter: Long Qu

Long Qu received his Ph.D. in Computer Science from the University of Paris-Sud in 2014 and is a computer scientist focusing on HPC. Since November 2020, he work as a research scientist at KAUST, designing and implementing high performance optimizations for seismic imaging simulations in order to maximize hardware resource utilizations on current and future systems. Previously, he worked as a HPC software development engineer at Total.

HP05 | HPC in The Cloud MVP

As part of Total’s computing strategy, a minimum viable product (MVP) has been conducted in 2019 and 2020. The goal of which was to evaluate the feasibility of deploying High Performance Computing (HPC) workflows in the cloud. While many industries have begun shifting business workloads to cloud, HPC still remains on-premises for most companies, including many of our peers in the energy industry. We decided to perform full seismic and reservoir studies which are representative of our production workload. This MVP is a continuation of a Request For Information (RFI) in 2017 and a Proof Of Concept (POC) in 2018. The outcome is to provide recommendations whether we can consider the cloud (fully, partially or not at all) in our HPC procurements. We have evaluated the following claims of the cloud providers:

flexibility and economic elasticity, such as on-demand deployment (pay-as-you-go).
application specific provisioning (right-sizing resources).
life cycle, allowing us quick access to cutting edge technologies.
scalability, for quick expansion of resources.
locality and availability, with global datacenters (e.g. covering disaster recovery).

Presenter: Jean-Remi Pontvianne

Jean-Remi Pontvianne is a HPC Production Engineering Manager at TOTAL SE. He received a master’s degree in Signal and Image Processing in 2007 from École Nationale Supérieure d'Ingénieurs Electriciens de Grenoble (ENSIEG) in Grenoble France. He worked for TOTAL SE since first as HPC Developer (from 2008 to 2010 in Pau, France), then as HPC Infrastructure Project Manager (from 2010 to 2013 in Pau, France), afterwards as Lead Infrastructure Operational Analyst (from 2013 to 2017 in Aberdeen, United Kingdom) and now as HPC Production Engineering Manager (since 2017 in Paris, France).He is interested in HPC Infrastructure, Cloud Solutions, and Machine Learning.

HP06 | HPC workload management for full resource utilization

In industrial HPC applications, the maximization of the overall performance is a complex task: besides optimization aspects strictly related to numerical algorithms, many others must be taken into account. In particular, in addition to single kernel execution, it may be necessary to focus also on workflow execution, as well as it could be important to optimize the execution of a heterogeneous projects workload with dynamically changing priorities. Here we will discuss how we faced this challenge in the context of running large seismic imaging projects.

Presenter: Nicola Bienati

Nicola Bienati holds a Ph.D. in Telecommunications Engineering from Politecnico di Milano. He joined Eni in 2002 and since then he has been mainly working on R&D in the field of seismic imaging technologies and on the development of the Eni HPC infrastructure.

HP07 | Leveraging DAOS file system for seismic data storage

DAOS-SEIS mapping layer is introduced to the seismic community, utilizing the evolving DAOS technology, to solve some of the seismic IO bottlenecks caused by the SEGY data format through leveraging the graph theory in addition to the DAOS object-based storage to design and implement a new seismic data format natively on top of the DAOS storage model in order to accelerate data access, provide in-storage compute capabilities to process data in place and to get rid of the serial seg-y file constraints. The DAOS-SEIS API is built on top of the DAOS file system(dfs) and seismic data is accessed and manipulated using the DAOS-SEIS API after accessing the root seismic dfs object. The mapping layer is perfectly utilizing the graph theory and the object storage to split the acquisition geometry represented by the traces headers away from the time-series data samples

Presenter: Merna Moawad

Merna Moawad graduated from the Computer and communications department at the faculty of engineering, Alexandria University obtaining a bachelor's degree in Computer engineering. Working as a parallel programming software engineer at Brightskies since 2019 at the HPC department. During this period Merna worked on the implementation and optimization of seismic imaging algorithms. Also worked on leveraging the graph theory and the DAOS file system to introduce a new native graph format for seismic data in collaboration with Intel.

HP08 | Cloud Elasticity Combined with Innovative Assisted History Match Accelerates Reservoir Risk Assessment

This paper will discuss the added value of a cloud environment in an innovative risk assessment solution involving the batch run of a geomodelling application and a flow simulator. The project was conducted jointly by Emerson and AWS. The first phase of the collaboration focused on assessing cloud parameters on the performance of the software and the cost of completing typical activities. It was then applied to a case study using the Volve oil field on the Norwegian continental shelf* to draw comparisons with current on-premises installations and evaluate the solution from a reservoir management perspective. The study first showed that while the optimal cloud configuration for history matching is dependent on global parameters, it must be fine-tuned for each reservoir model. A method to reduce simulation time and cloud costs was established. The study then showed that use of the cloud has a positive impact on operations. The results are available within hours instead of days; same-day evaluation leads to faster decisions and improved operational efficiency. Moreover, enhanced stochastic analyses are now possible for those without high-performance on-premises clusters.

Presenter: Camille Cosson

Camille Cosson works within the Emerson E&P Reservoir Modeling and Engineering team. She is an expert in geological modeling and has worked on various field studies in Europe, Africa, South America and Asia. She has knowledge in Petroleum Geoscience, Seismic Interpretation and Reservoir Modeling, and provides her expertise in integrated reservoir studies. She holds a MSc degree in Petroleum Geoscience from the National School of Geology in Nancy, France.

HP09 | GEOSX: a multiphysics, multiscale, reservoir simulator for HPC

GEOSX is an open-source, exascale-ready, multiphysics simulator for geological formations. This simulator is currently developed by Lawrence Livermore National Laboratory, Stanford University, and Total. GEOSX is designed to address several types of complex simulation use cases, including geological storage of carbon dioxide (CCUS). Numerical simulations of such operations require coupling between mass/energy transfers and rock geomechanics over large formations and for long simulation periods. GEOSX provides such simulation solutions for mixed-architecture systems. The code is not limited to a specific, unique, target architecture by the use of two libraries called RAJA and CHAI. Here, we present the essential functionalities and underlying building blocks of GEOSX. Examples of use cases are shown. We discuss challenges encountered along the way and especially those related to multiphysics modeling. Last, we provide all references required to access, download the sources, and build GEOSX (under LGPL 2.1 license).

Presenter: Herve Gross

Dr. Herve Gross a is senior R&D project manager at Total. He is a reservoir engineer by formation and has worked as a developer and deployer of several libraries for geostatistics, uncertainty quantification, field development optimization.

HP10 | GPU accelerated FWI using the Open Concurrent Computing Abstraction (OCCA)

Adapting high-performance software to various architecture while ensuring performance is a challenging endeavor more so in our industry where HPC drives explorations and production activities. Seismic exploration is impossible without seismic imaging and velocity model building which both entirely rely on supercomputers. In this work, we describe how we ported our existing proprietary seismic libraries to GPUs & how this single effort will work out for many other architectures.

Presenter: Amik St-Cyr

Amik St-Cyr is a Mathematical-Physicist that likes to simulate physics on computers. 2011-Present Shell. 2003-2011 National Center for Atmospheric Research. 2002-2003 Postdoc McGill University: HPC for CFD. 2002 PhD Applied-Mathematics Universite de Montreal: shock Capturing Methods. BSc Mathematical-Physics Universite de Montreal.

HP11 | Opensource RTM using DPC++ programming model

In this work we present oneAPI - an opensource specification- and DPC++ Programming Language . We explain why DPC++ is an efficient language for programming different devices and for different vendors and briefly introduce Reverse Time Migration (RTM) as a use case and demonstrate how its finite difference kernel is implemented in DPC++.

Presenter: Ehab Nasr

Ehab M. Nasr is an HPC Software Engineer at Brightskies Inc. Through his 3+ years of experience he has been working with teams in developing and optimizing various applications in Seismic Imaging and Seismic Processing.He has worked with various customers and SMEs from Geo-science, Geophysics and HPC to benchmark, optimize and develop Seismic Software.Ehab has also contributed in various publications in various conferences in the HPC and Oil and Gas domains.

HP12 | GPUFORT: A source-to-source translator for Fortran accelerator dialects

Presenter: Dominic Etienne Charrier

Dominic joined AMD a little less than two years ago to work as HPC Software Development Engineer in AMD's Client Solution Group (CSG). Together with his CSG colleagues, he enables and optimizes client-specific HPC codes for AMD datacenter devices. He has BSc & MSc degrees in electrical and computational engineering from TU Darmstadt and a PhD in computer science from Durham University (UK). During his PhD studies, he developed communication-avoiding, task-parallel implementations of robust explicit high-order FEM methods on dynamically adaptive grids for applications in seismology and astrophysics.

HP13 | Application of the vectorization library NSIMD to the EFISPEC3D kernel

We show in this work that using the NSIMD vectorization library allowed to obtain better performances on the EFISPEC3D kernel, a spectral-finite-element method to solve the forward seismic wave propagation problem. Moreover the same code without modification can be compiled to target different SIMD extensions with different vector sizes without degrading performances.

Presenter: Guillaume Quintin

Guillaume Quintin got a PhD in computer science after his studies in pure mathematics. He is an expert in software development. He is responsible for the open source project NSIMD (https://github.com/agenium-scale/nsimd) specialized in High Performance Computing (HPC). He participates in its development and in particular in GPU abstraction including NVIDIA and AMD GPUs using CUDA and ROCm respectively. He also manages other software development projects and supervises Agenium Scale's technical teams by supporting them with his technical expertise.

HP14 | Performance Characterization of a Vector Architecture for Seismic Applications

Explicit time-domain finite-difference (TD-FD) methods are largely used in seismic exploration. They are at the heart of wave-equation based geophysical algorithms such as Reverse Time Migration and Full Waveform Inversion. Due to the ever-increasing amount of acquired seismic data and the need for higher resolution to optimize oil production, it is crucial to deploy TD-FD on High Performance Computing (HPC) platforms. In this work, we explore the performance reachable on vector architectures. The study is done on a traditional scalar CPU to get a performance baseline, and on a vector solution which was heavily used in the past by the O&G industry.

Presenter: Vincent Etienne

Vincent Etienne is a research geophysicist at Saudi Aramco EXPEC Advanced Research Center, Dhahran, Saudi Arabia. His areasof expertise are numerical modelling and high performance computing. He is particularly interested in the design of algorithmstailored to geophysical applications like seismic modelling or reverse time migration. He holds a PhD in geophysics from theUniversity of Nice Sophia Antipolis (France) and he published numerous papers in geophysical journals.

LT01 | Performance Evaluation of Stencil Calculation in RTM Code

Performance of the stencil operation, which is widely used in the geoscience space is discussed. The characteristics of the stencil operation is high B/F demand, which can be accelerated through the provision of higher memory bandwidth per core/processor and the reduction of the number/frequency of memory accesses. NEC SCA can reduce memory access, and the NEC VE20 processor of SX-Aurora TSUBASA provides much higher memory bandwidth than other processors. Because of these advantages, the VE20 processor with SCA provides up to nine times higher performance than modern x86 processors.

Presenter: Shintaro Momose

Shintaro MOMOSE is a principal engineer at NEC Deutschland, and a visiting associate professor of Cyber Science Center at Tohoku University, Japan. His responsibilities at NEC are marketing, architecture development, and promotion of the SX vector supercomputer. He has joined NEC Deutschland since 2018. Previously, he was an engineer at Nissan Motor, and he started his work at NEC Corporation from 2005. He developed the SX supercomputer series, SX-9, SX-ACE, and SX-Aurora TSUBASA, and also he is developing the successor of the SX supercomputer. He received the B.E. Degree in Mechanical Engineering, and the M.S. and the Ph.D. Degrees in Information Sciences from Tohoku University in 1999, 2003, and 2005 respectively.

LT02 | Nonlinear Preconditioning for Two-phase Flows

Using a classical Newton-Krylov method to solve the resulting nonlinear system of two-phase flows in porous media often suffers from slow convergence or failure in line search. We propose two nonlinear elimination preconditioning strategies to handle this issue by performing subspace correction to remove the local strong nonlinearities. Numerical experiments show that the proposed methods are more robust and faster than the existing method with respect to some physical and numerical parameters, and scalable to thousands of processes.

Presenter: Li Luo

Dr. Li Luo is a Postdoc Fellow at the Extreme Computing Research Center, King Abdullah University of Science and Technology. His research interests are parallel algorithms and high-performance software for linear and nonlinear partial differential equations, domain decomposition methods, and multiphase flows.

LT03 | Improving GPU throughput of reservoir simulations using NVIDIA MPS and MIG

In this paper we demonstrated that the overall simulation throughput of full-GPU reservoir simulators can be further improved significantly without any modifications to the software, using NVIDIA’s Multi-Processing-Service and Multi-Instance-GPU infrastructure. For models with just a few thousand cells, a throughput increase of 7x is achieved while for problems with a million cells a 60% improvement is achieved using MPS. Furthermore, when using either MPS or MIG, the smaller models can achieve 80% of the peak achievable performance of larger models. In the context of uncertainty quantification workflows, these performance improvements are significant.

Presenter: Rajesh Gandham

Rajesh Gandham is a Software Architect at Stone Ridge Technology. He is experienced in developing high-performance simulation software for multi-physics simulations. He holds Bachelors & Masters degrees in Aerospace Engineering from the Indian Institute of Technology Madras and a Ph.D. in Computational And Applied Mathematics from Rice University.

LT04 | Toward an application of quantum computing in geophysics

Quantum computing offers a theoretical speed-up (in terms computational complexity) when performing certain computational tasks. Successful inclusion of quantum computing units in existing HPC solutions is contingent on identifying appropriate and realistic use-cases, of which, to date, there are very few candidates. This is particularly true for upstream business. This is because, there are few quantum algorithms, each coming with a number of caveats which ought to be carefully studied. We suggest a potential simple use-case in geophysics and 1D inverse scattering theory and provide detailed analysis of the requirements needed to be met in order to achieve the promised speed-up.

Presenter: Marcin Dukalski

Marcin Dukalski is a research geophysicist at the Aramco Global Research Center in Delft. He holds a PhD from the Kavli Institute of Nanoscience (Delft Technical University) and has 7 years of industry experience specializing in topics of inverse scattering theory, seismic multiples and applications of quantum computing in geosciences. Marcin is also a member of the industrial sounding committee at the Dutch National Agenda for Quantum Technologies and works together with representatives of academia, Dutch National Lab and quantum start-ups in the Netherlands on finding applications of quantum computing in upstream business.

LT05 | MPI + DPCPP for scalable and portable RTM

We present our latest implementation based on MPI for shot distribution on top of our DPC++ kernel implementation.

Presenter: Ahmed Ayyad

Ahmed is a senior HPC software engineer and a member of Brightskies' parallel programming team. He works on software benchmarking and code optimization/modernization of the company's state of the art computational science and numerical analysis products. Ahmed’s work involved developing software optimized for different hardware architectures. Ahmed is part of the Brightskies team that is developing one of the earliest substantial products adopting the oneApi and DPC++ technologies. Previously Ahmed worked at Valeo, developing automotive software solutions to OEMs and Tier1 suppliers. Ahmed holds a B.Sc. in electrical engineering from Alexandria University. He has R&D experience in software optimization, computer vision, machine learning/deep learning and software architecture

LT06 | Optimizing HPC Parameters for Reverse Time Migration

Reverse Time Migration (RTM) is a key application used in seismic imaging and accounts for a significant portion of HPC resource utilization in the Oil & Gas exploration industry. The number of nodes used per shot and the corresponding domain decomposition can have a significant impact on RTM performance and project cycle time. In this work we will describe a method to automatically select the optimal number of nodes and the best domain decomposition for each shot in a RTM project. In addition to the computational savings and reduction in cycle time, the method presented here also improves the user experience for seismic processors as they only need to focus on tuning the parameters that affect the results and do not have to worry about the HPC parameters. This becomes even more important as compute environments become more heterogeneous and projects try to use all available compute resources in order to reduce cycle time. Also in a pay-per-use model, it is helpful to be able to predict the compute costs for a project.

Presenter: Rahul Sampath

Rahul Sampath is an UIT-HPC Advisor in ExxonMobil Technical Computing Company. Prior to joining ExxonMobil, he was a researcher at Oak Ridge National Laboratory. He has over 15 years of experience in research and software development in the field of scientific computing. He has worked on several multidisciplinary projects with applications in nuclear energy, medical imaging and seismic imaging. Rahul has authored several scientific publications and software. He has served as a technical reviewer for scientific journals and conferences and as a member of technical program committees for conferences. He was awarded the ACM/IEEE Gordon Bell Prize in 2010. He holds a Ph.D. in Computational Science and Engineering from Georgia Institute of Technology. He also received Bachelors and Masters degrees in Mechanical Engineering from Birla Institute of Technology & Science, Pilani, India and University of Pennsylvania, respectively.