Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example 3 of D4.3 - Pre-processing #12

Open
cozzolinoac11 opened this issue May 8, 2023 · 2 comments
Open

Example 3 of D4.3 - Pre-processing #12

cozzolinoac11 opened this issue May 8, 2023 · 2 comments
Labels
a/p metadata documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@cozzolinoac11
Copy link
Member

cozzolinoac11 commented May 8, 2023

Use case

common

Name of resource

JPEG images to numpy array transformation

ID

JPEG_to_numpy_transformation

Description

Building dataset as numpy array. In machine learning, Python uses image data in the format of a NumPy array, i.e., [Height, Width, Channel] format. Therefore, the images must be transformed in this format. In this case, the images are in JPEG format and, through pillow, NumPy and OpenCV functions, the transformation is performed. The cv2 package (OpenCV) has the method imread() which is used to load the image and it also reads the given image (PIL image) in the NumPy array format. Because the images within the dataset (i.e., the NumPy arrays) must all be the same size to be used, and as a matter of efficiency and calculation power, using cv2's resize() the images are resized from 350x350 pixels into 100x100 (this dimension can be easily changed). The channel is three because the images are RGB. This method then returns a dataset containing the images in the format of NumPy arrays and their respective class labels.

Main category

Pre-processing

Other category

No response

Publication date

2023-08-05

Objective

data-transformation

Platform

Google Colab

Framework

OpenCV

Architecture

None

Approach

None

Algorithm

custom-method

Processor

cpu

OS

linux

Keyword

numpy array, data transformation, jpeg

Reference link

No response

Example

https://github.com/cozzolinoac11/wildfire_prediction/blob/main/img_to_NPY_transformation.ipynb

Input data used

  1. https://open.canada.ca/data/en/dataset/9d8f219c-4df0-4481-926f-8a2a532ca003

Characteristics of input data

  1. Refer to Canada's website for the original wildfires data. The dataset is composed by satellite images (shape is 350x350).

Biases and ethical aspects

No response

Output data obtained

  1. https://public.epsilon-italia.it/FAIRiCUBE/wildfire-classification/data_numpy.zip

Characteristics of output data

  1. Dataset in format Numpy arrays. The images are resized in 100x100.

Performance

No response

Conditions for access and use

cc-by-4.0

Constraints

No response

@cozzolinoac11 cozzolinoac11 added documentation Improvements or additions to documentation good first issue Good for newcomers labels May 8, 2023
@cozzolinoac11 cozzolinoac11 changed the title Example 1 of D4.3 - Pre-processing Example 3 of D4.3 - Pre-processing May 8, 2023
@KathiSchleidt
Copy link
Member

Similar to the comment on #11 I think a bit more detail may be useful for non-expert users

On the description, could you provide a bit more detail on how the transformation is performed, what's available in the numpy array (how do you split the JPEG RGB to the array)

On "Input data used", the page you link to provides diverse datasets, it's unclear which are being used. In "Characteristics of input data", there's no link, only way of finding the information is the input data link.

On sizes, you don't provide a UoM. I'm assuming meters, but would be nice to add.

@cozzolinoac11
Copy link
Member Author

The same comment made in issue #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/p metadata documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants