Introduction

The “Wellbore Acoustic Image Database” (WAID) project is part of PETROBRAS' efforts to promote innovation worldwide in the Oil and Gas industry.

The WAID project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab and is available on the PETROBRAS' Reservoir GitLab repository.

Motivation

The “Wellbore Acoustic Image Database” (WAID) aims to promote the development of applications based on Machine Learning, particularly Deep Learning, for automating tasks related to interpreting acoustic image logs representing the wellbore surface. Such solutions involve the segmentation of structures, filling of voids in the image, event detection and generation of new synthetic data, among others.

The WAID repository contains a dataset composed of image data with associated conventional open-hole log data and a set of basic jupyter notebooks for basic handling and early data exploration.

Strategy

The “Wellbore Acoustic Image Database” project belongs to the PETROBRAS' program called Conexões para Inovação - Módulo Open Lab. This is an open project composed of the following parts:

A dataset composed of acoustic image data from 7 wells with associated conventional open-hole logs;
A set of scripts in Jupyter Notebooks format in Python language for basic handling and visualization of this data.

Our strategy is to make these resources available to the global community and develop the WAID project collaboratively.

Ambition

Acoustic image logs are a class of logging acquisition that allows the construction of wellbore images to bring rich and intuitive geological features for human experts to analyze. However, they are composed of very high-resolution measurements, bearing a considerably larger information content than conventional open-hole logs. For this reason, they demand a lot of time-consuming routines for petrophysicists to extract information from it.

This high information density feature of acoustic images makes them perfectly suitable to benefit from artificial intelligence-based techniques. Incorporating such techniques will allow petrophysical interpreters to speed up routine procedures, discover new applications, and extract more knowledge from this data source.

With this project, PETROBRAS intends to foster the incorporation of ML/IA techniques to speed up routine procedures but especially to foment:

the development of new methods to improve classical applications such as identifying and discriminating geological structures (like fractures or vugs) from artifacts (like breakouts) and petrophysical parameter estimation;
the development of new applications based on image logs;
the discovery of new knowledge extracted from image logs.

Contributions

We expect to receive various types of contributions from individuals, research institutions, startups, companies and partner oil operators.

Before you can contribute to this project, you need to read and agree to the following documents:

It is also very important to know, participate and follow the discussions. See the discussions section.

Licenses

All the code of this project is licensed under the Apache 2.0 License and all dataset data files (CSV files in the subdirectories of the dataset directory) are licensed under the Creative Commons Attribution 4.0 International License.

Datasets

In its first release, WAID contains data from five wells in a Brazilian carbonate pre-salt reservoir. Their relative locations are shown below¹:

The names of the wells and their depths have been masked to protect the confidentiality of the information. Instead of the original name and number, each well was named after a different endangered species. An action to raise awareness of our role in preserving other earthlings. The table below shows the wells identifiers and names:

Well ID	Well name
A	ANTILOPE-37
B	TATU-22
C	BOTOROSA-47
D	COALA-88
E	ANTILOPE-25

All datasets consist of '.csv' files whose values are expressed in Brazilian numeric format (i.e. the decimal symbol is a colon ',' and the separator is a semicolon ';').

Acoustic image logs dataset

To load an acoustic amplitude image data from a given well (for example, TATU-22), one can use the following Pandas command line:

import pandas as pd
 
bsc_data = pd.read_csv('tatu22_IMG.csv',
                        sep = ';',
                        decimal = ',',
                        ...)

Due to size restrictions in the image data files updated on GitHub, the original CSV files had to be split into many subfiles. We provide a Python function to solve this problem:

import pandas as pd
import numpy as np
import os

def concat_IMG_data(well_id, data_path):
    # Due to file size limitations, the original AMP '.csv' file
    # has been split into several sub-files.
    # The concat_IMG_data() function aims to concatenate
    # them back into a single data object.
    #
    # concat_IMG_data() returns 'image_df', a Pandas dataframe
    # indexed by DEPTH information and whose columns are
    # the azimuthal coordinates of the AMP image log.
    
    # Name of the initial '00' file
    initial_file = well_id + "_AMP00.csv"

    # Read the the initial file to capture header information
    initial_file_path = os.path.join(data_path, initial_file)
    image_df = pd.read_csv(initial_file_path,sep = ';',
                           index_col=0,
                           na_values = -9999,na_filter = True,
                           decimal = ',',
                           skip_blank_lines = True).dropna()

    # Read and add data from the remaining files sequentially
    for file in os.listdir(data_path):
        if file.startswith(well_id) and file != initial_file:
            file_path = os.path.join(data_path, file)
            df_temp = pd.read_csv(file_path,sep = ';',
                                  header=None,index_col = 0,
                                  na_values = -9999, na_filter = True,
                                  decimal = ',', skip_blank_lines = True,
                                  dtype=np.float32
                                 ).dropna()
            
            # Adjust tem df's header to match image header
            df_temp.columns=image_df.columns
            
            # Concat dfs
            image_df = pd.concat([image_df, df_temp])
    return image_df

After defining img_data_path and well_identifier, the above function returns the well image log data in a single Pandas dataframe, for example:

# Whole image data
img_data = concat_IMG_data(well_identifier,img_data_path)

Basic logs dataset

To load a basic log data from a given well (for example, TATU-22), one can use the following Pandas command line:

import pandas as pd

bsc_data = pd.read_csv('tatu22_BSC.csv',
                        sep = ';',
                        decimal = ',',
                        ...)

The chosen nomenclature is as follows:

for image data files: <well_name>_AMP.csv (AMP comes from the amplitude of the acoustic signal captured by the imaging tool. The values in the file express acoustic attenuation measures in dB.)
for basic logging data files: <well_name>_BSC.csv (BSC comes from the word basic). The basic curves present in the basic dataset are:
- Caliper (CAL), unit: in.
- Gamma Ray (GR), unit: GAPI.
- Bulk Density (DEN), unit: g/cc.
- Neutron Porosity (NEU), unit: p.u..
- Sonic Compressional Slowness (DTC), unit: µs/ft.
- Sonic Shear Slowness (DTS), unit: µs/ft.
- Photoelectric Factor (PE), unit: barns/cc
- NMR Total Porosity(nmrPhiT), unit: p.u..
- NMR Effective Porosity (nmrPhie), unit: p.u..
- NMR Permeability (nmrPerm), unit: mD
- NMR Free Fluid (nmrFF), unit: p.u..
- Shallow Formation Resistivity (RES10), unit: Ω/m (Ohm/m).
- Deep Formation Resistivity (RES90), unit: Ω/m (Ohm/m).

It is important to highlight that the caliper log is often used as a data quality indicator.

Missing values

Some isolated curve values, or even the entire DTS curve (COALA-88), are missing in some of the wells. We encourage users to try Statistical and Machine Learning imputation techniques to imput missing values and missing curves.

Jupyter Notebooks (Python)

Two Jupyter notebooks, Plot_composite_logs.ipynb and Plot_segment_acoustic_image.ipynb, are provided to illustrate the potential of the dataset:

Plot_composite_logs.ipynb shows how to load the data and plot basic and image logs in a composite display at user defined depth intervals

Plot_segment_acoustic_image.ipynb shows the basic handling of the image log data and an application for image segmentation based on amplitude value thresholds

Published work using WAID data

In this section we aim to include an updated list of published papers (from journals or conferences) or other academic/technical works that have used data from this database

Rewbenio A. Frota, Marley M. B. R. Vellasco, Guilherme A. Barreto and Candida M. de Jesus, "Heteroassociative Mapping with Self-Organizing Maps for Probabilistic Multi-output Prediction", 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-6, DOI: 10.1109/IJCNN60899.2024.10650225.
Frota, Rewbenio A., Barreto, G.A., Vellasco, Marley M.B.R. and Menezes de Jesus, Candida, "New Cloth Unto an Old Garment: SOM for Regeneration Learning". In: Villmann, T., Kaden, M., Geweniger, T., Schleif, FM. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Interpretable Machine Learning, and Beyond. WSOM+ 2024. Lecture Notes in Networks and Systems, vol 1087. Springer, Cham, DOI: 10.1007/978-3-031-67159-3_1.
Rewbenio A. Frota, Marley M. B. R. Vellasco, Guilherme A. Barreto and Candida M. de Jesus, "Rede SOM para Aprendizado de Representações Multimodais com Aplicação em Petrofísica", XXV Congresso Brasileiro de Automática (CBA 2024), Rio de Janeiro, Brasil, 2024, DOI: - (in portuguese).

Adapted from Frota et al., 2024 Frota et al., 2024 with permission of the authors. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
clas		clas
dataset		dataset
notebooks		notebooks
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTOR_LICENSE_AGREEMENT.md		CONTRIBUTOR_LICENSE_AGREEMENT.md
LICENSE.md		LICENSE.md
README.md		README.md
tatu22_AMP03.csv		tatu22_AMP03.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Introduction

Motivation

Strategy

Ambition

Contributions

Licenses

Datasets

Acoustic image logs dataset

Basic logs dataset

Missing values

Jupyter Notebooks (Python)

Published work using WAID data

About

Releases

Packages

Contributors 2

Languages

License

petrobras/waid

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Motivation

Strategy

Ambition

Contributions

Licenses

Datasets

Acoustic image logs dataset

Basic logs dataset

Missing values

Jupyter Notebooks (Python)

Published work using WAID data

Footnotes

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages