The “Wellbore Acoustic Image Database” (WAID) project is part of PETROBRAS' efforts to promote innovation worldwide in the Oil and Gas industry.
The WAID project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab and is available on the PETROBRAS' Reservoir GitLab repository.
The “Wellbore Acoustic Image Database” (WAID) aims to promote the development of applications based on Machine Learning, particularly Deep Learning, for automating tasks related to interpreting acoustic image logs representing the wellbore surface. Such solutions involve the segmentation of structures, filling of voids in the image, event detection and generation of new synthetic data, among others.
The WAID repository contains a dataset composed of image data with associated conventional open-hole log data and a set of basic jupyter notebooks for basic handling and early data exploration.
The “Wellbore Acoustic Image Database” project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab. This is an open project composed of the following parts:
- A dataset composed of acoustic image data from 7 wells with associated conventional open-hole logs;
- A set of scripts in Jupyter Notebooks format in Python language for basic handling and visualization of this data.
Our strategy is to make these resources available to the global community and develop the WAID project collaboratively.
Acoustic image logs are a class of logging acquisition that allows the construction of wellbore images to bring rich and intuitive geological features for human experts to analyze. However, they are composed of very high-resolution measurements, bearing a considerably larger information content than conventional open-hole logs. For this reason, they demand a lot of time-consuming routines for petrophysicists to extract information from it.
This high information density feature of acoustic images makes them perfectly suitable to benefit from artificial intelligence-based techniques. Incorporating such techniques will allow petrophysical interpreters to speed up routine procedures, discover new applications, and extract more knowledge from this data source.
With this project, PETROBRAS intends to foster the incorporation of ML/IA techniques to speed up routine procedures but especially to foment:
- the development of new methods to improve classical applications such as identifying and discriminating geological structures (like fractures or vugs) from artifacts (like breakouts) and petrophysical parameter estimation;
- the development of new applications based on image logs;
- the discovery of new knowledge extracted from image logs.
We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.
Before you can contribute to this project, you need to read and agree to the following documents:
It is also very important to know, participate and follow the discussions. See the discussions section.
All the code of this project is licensed under the Apache 2.0 License and all dataset data files (CSV files in the subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.
In its first release, WAID contains data from five wells in a Brazilian carbonate pre-salt reservoir. Their relative locations are shown below1:
The names of the wells and their depths have been masked to protect the confidentiality of the information. Instead of the original name and number, each well was named after a different endangered species. An action to raise awareness of our role in preserving other earthlings. The table below shows the wells identifiers and names:
Well ID | Well name |
---|---|
A | ANTILOPE-37 |
B | TATU-22 |
C | BOTOROSA-47 |
D | COALA-88 |
E | ANTILOPE-25 |
All datasets consist of '.csv' files whose values are expressed in Brazilian numeric format (i.e. the decimal symbol is a colon ',' and the separator is a semicolon ';').
To load an acoustic amplitude image data from a given well (for example, TATU-22), one can use the following Pandas command line:
import pandas as pd
bsc_data = pd.read_csv('tatu22_IMG.csv',
sep = ';',
decimal = ',',
...)
Due to size restrictions in the image data files updated on GitHub, the original CSV files had to be split into many subfiles. We provide a Python function to solve this problem:
import pandas as pd
import numpy as np
import os
def concat_IMG_data(well_id, data_path):
# Due to file size limitations, the original AMP '.csv' file
# has been split into several sub-files.
# The concat_IMG_data() function aims to concatenate
# them back into a single data object.
#
# concat_IMG_data() returns 'image_df', a Pandas dataframe
# indexed by DEPTH information and whose columns are
# the azimuthal coordinates of the AMP image log.
# Name of the initial '00' file
initial_file = well_id + "_AMP00.csv"
# Read the the initial file to capture header information
initial_file_path = os.path.join(data_path, initial_file)
image_df = pd.read_csv(initial_file_path,sep = ';',
index_col=0,
na_values = -9999,na_filter = True,
decimal = ',',
skip_blank_lines = True).dropna()
# Read and add data from the remaining files sequentially
for file in os.listdir(data_path):
if file.startswith(well_id) and file != initial_file:
file_path = os.path.join(data_path, file)
df_temp = pd.read_csv(file_path,sep = ';',
header=None,index_col = 0,
na_values = -9999, na_filter = True,
decimal = ',', skip_blank_lines = True,
dtype=np.float32
).dropna()
# Adjust tem df's header to match image header
df_temp.columns=image_df.columns
# Concat dfs
image_df = pd.concat([image_df, df_temp])
return image_df
After defining img_data_path
and well_identifier
, the above function returns the well image log data in a single Pandas dataframe, for example:
# Whole image data
img_data = concat_IMG_data(well_identifier,img_data_path)
To load a basic log data from a given well (for example, TATU-22), one can use the following Pandas command line:
import pandas as pd
bsc_data = pd.read_csv('tatu22_BSC.csv',
sep = ';',
decimal = ',',
...)
The chosen nomenclature is as follows:
- for image data files: <well_name>_AMP.csv (AMP comes from the amplitude of the acoustic signal captured by the imaging tool. The values in the file express acoustic attenuation measures in dB.)
- for basic logging data files: <well_name>_BSC.csv (BSC comes from the word basic). The basic curves present in the basic dataset are:
- Caliper (CAL), unit: in.
- Gamma Ray (GR), unit: GAPI.
- Bulk Density (DEN), unit: g/cc.
- Neutron Porosity (NEU), unit: p.u..
- Sonic Compressional Slowness (DTC), unit: µs/ft.
- Sonic Shear Slowness (DTS), unit: µs/ft.
- Photoelectric Factor (PE), unit: barns/cc
- NMR Total Porosity(nmrPhiT), unit: p.u..
- NMR Effective Porosity (nmrPhie), unit: p.u..
- NMR Permeability (nmrPerm), unit: mD
- NMR Free Fluid (nmrFF), unit: p.u..
- Shallow Formation Resistivity (RES10), unit: Ω/m (Ohm/m).
- Deep Formation Resistivity (RES90), unit: Ω/m (Ohm/m).
It is important to highlight that the caliper log is often used as a data quality indicator.
Some isolated curve values, or even the entire DTS curve (COALA-88), are missing in some of the wells. We encourage users to try Statistical and Machine Learning imputation techniques to imput missing values and missing curves.
Two Jupyter notebooks, Plot_composite_logs.ipynb
and Plot_segment_acoustic_image.ipynb
, are provided to illustrate the potential of the dataset:
Plot_composite_logs.ipynb
shows how to load the data and plot basic and image logs in a composite display at user defined depth intervals
Plot_segment_acoustic_image.ipynb
shows the basic handling of the image log data and an application for image segmentation based on amplitude value thresholds
In this section we aim to include an updated list of published papers (from journals or conferences) or other academic/technical works that have used data from this database
- Rewbenio A. Frota, Marley M. B. R. Vellasco, Guilherme A. Barreto and Candida M. de Jesus, "Heteroassociative Mapping with Self-Organizing Maps for Probabilistic Multi-output Prediction", 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-6, DOI: 10.1109/IJCNN60899.2024.10650225.
- Frota, Rewbenio A., Barreto, G.A., Vellasco, Marley M.B.R. and Menezes de Jesus, Candida, "New Cloth Unto an Old Garment: SOM for Regeneration Learning". In: Villmann, T., Kaden, M., Geweniger, T., Schleif, FM. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Interpretable Machine Learning, and Beyond. WSOM+ 2024. Lecture Notes in Networks and Systems, vol 1087. Springer, Cham, DOI: 10.1007/978-3-031-67159-3_1.
- Rewbenio A. Frota, Marley M. B. R. Vellasco, Guilherme A. Barreto and Candida M. de Jesus, "Rede SOM para Aprendizado de Representações Multimodais com Aplicação em Petrofísica", XXV Congresso Brasileiro de Automática (CBA 2024), Rio de Janeiro, Brasil, 2024, DOI: - (in portuguese).
Footnotes
-
Adapted from Frota et al., 2024 Frota et al., 2024 with permission of the authors. ↩