Use Cases
A Seismic Shift in Earth Data's Value
SGRI and Huawei's seismic data resource pool is revolutionizing oil and gas exploration.
By Wang Yongbo, senior high-performance computing expert, Sinopec Geophysical Research Institute
With the global consumption of oil increasing each year, the oil and gas industry has found itself at the forefront of high performance computing (HPC) application and innovation, identifying ways HPC can be used to find more oil layers and provide greater accuracy for locating wells.
The oil and gas exploration workflow
There are three steps in oil and gas exploration: collecting seismic data in the field, processing that data, and interpreting it.
Field data collection
At this stage, electromagnetic and gravitational technologies are key. Usually, artificial earthquakes – created through manual blasting or seismic vibrators – are used to generate seismic waves. These seismic waves that emanate from underground are then transformed into field data.
Seismic data processing
This process uses HPC for seismic signal processing and generates seismic datasets that reflect underground geological features.
Seismic data interpretation
First, technicians analyze and examine data to find possible oil and gas reservoirs and provide suggestions for well location. After the seismic data is interpreted, geologists can determine the location and size of the oil and gas reservoirs, whether they’re worth extracting, and where to drill wells based on the generated geological map.
Bottlenecks in oil and gas exploration
Oil and gas exploration is like performing a CT scan of the Earth. Collecting more data and processing it using more refined techniques provides greater accuracy of the geological structures deep below the Earth's surface. This increases the chance of finding oil.
Oil and gas reservoir exploration data has the following characteristics:
First, the volume of seismic data is huge. A raw seismic dataset is usually dozens of TB in size and can even reach PBs, there are many intermediate steps during data processing, and a huge amount of temporary files and intermediate data are generated. Therefore, a single processing task takes up 10 times more storage space than raw data alone.
Second, seismic data processing involves multiple processes and frequent I/O operations. Processing seismic data just once involves dozens of steps – ignoring possible iterations – and up to 400 software modules. During operations, these modules must frequently exchange data, which puts great pressure on built-in disks and external storage to perform read and write tasks.
Third, processing seismic data involves immense computation and lengthy processes. In addition to consecutive processing, parallel processing also takes place in many compute-intensive scenarios. A single computing task usually involves weeks of uninterrupted computing, requiring extremely high reliability.
With the higher level of precision in data acquisition in recent years, problems with traditional data processing systems are increasingly obvious. The first is that separate systems create multiple data silos. As data needs to be repeatedly copied between different computing clusters, overall data processing efficiency is negatively affected. Second, resource isolation leads to insufficient data sharing and low resource utilization. In addition, I/O processing capabilities are fast becoming a bottleneck in that CPU waiting times keep increasing and computing power cannot be fully utilized.
Data processing based on distributed architecture
To address the above challenges, SGRI teamed up with Huawei to develop a converged and shared seismic data resource pool based on Huawei's OceanStor Mass Storage solution, resulting in a more efficient and cost-effective data analysis and processing platform.
The storage layer is critical to seismic data processing, because computing speeds are much higher than the read and write speeds of disks. In SGRI's case, data migration and conversion takes up more than 35 percent of the time needed to process seismic data. Therefore, storage solutions that feature high bandwidth and low latency are key to improving exploration efficiency.
Huawei's OceanStor Mass Storage can do just that. Featuring large bandwidth and low latency, OceanStor Mass Storage greatly reduces I/O waiting time and improves the parallel processing efficiency of computing cluster CPUs. With CPU usage holding stable at over 60 percent, data analysis is over 16-percent faster.
Powered by decentralized architecture, OceanStor Mass Storage reduces the number of cabinets by 40 percent and TCO by 30 percent by integrating and sharing storage resources. This allows it to meet data expansion requirements for the next 5 to 10 years.
OceanStor Mass Storage adopts the N + M elastic EC data protection mode. "N" indicates the number of data fragments and "M" indicates the number of parity fragments. This protects data against simultaneous failures of M storage nodes and supports automatic data-shrinking, boosting data reliability and providing sustained assurance for lengthy periods of seismic data processing.
Oil and gas exploration is now a testing ground for emerging technologies, and seismic exploration data has obvious big data features. As stream and non-stream processing and structured, semi-structured, and unstructured data are used, the oil and gas industry is an ideal field for trialing new technologies.
Moving forward, big data and AI will become more widely used in the energy exploration sector to optimize exploration efficiency. Continuous data refinement and algorithm training can improve forecasts relating to oil layers, well positions, and reserves, catapulting oil and gas exploration into a new era.