CH16, Sentiment Analysis Numpy version issue/Memory leak? #7

Barleysack · 2021-04-22T10:47:45Z

16장, RNN과 어텐션을 사용한 자연어 처리 챕터의 감성 분석, (Sentiment Analysis)
tfds에서 받아온 데이터셋을 전처리 후 훈련하는 과정의 코드 부분입니다.
저는 노트북의 GTX1660ti를 사용해 작업을 하고 있습니다.

import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow import keras
import tensorboard
import os
import numpy as np


(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data()

X_train[0][:10]


datasets, info = tfds.load("imdb_reviews", as_supervised=True, with_info=True)
train_size = info.splits["train"].num_examples
test_size = info.splits["test"].num_examples
print(train_size)



X_train[0][:10]

word_index = keras.datasets.imdb.get_word_index()
id_to_word = {id_ + 3: word for word, id_ in word_index.items()}
for id_, token in enumerate(("<pad>", "<sos>", "<unk>")):
    id_to_word[id_] = token
" ".join([id_to_word[id_] for id_ in X_train[0][:10]])


import tensorflow_datasets as tfds

datasets, info = tfds.load("imdb_reviews", as_supervised=True, with_info=True)


datasets.keys()

train_size = info.splits["train"].num_examples
test_size = info.splits["test"].num_examples

for X_batch, y_batch in datasets["train"].batch(2).take(1):
    for review, label in zip(X_batch.numpy(), y_batch.numpy()):
        print("Review:", review.decode("utf-8")[:200], "...")
        print("Label:", label, "= Positive" if label else "= Negative")
        print()

def preprocess(X_batch, y_batch):
    X_batch = tf.strings.substr(X_batch, 0, 300)
    X_batch = tf.strings.regex_replace(X_batch, rb"<br\s*/?>", b" ")
    X_batch = tf.strings.regex_replace(X_batch, b"[^a-zA-Z']", b" ")
    X_batch = tf.strings.split(X_batch)
    return X_batch.to_tensor(default_value=b"<pad>"), y_batch


preprocess(X_batch, y_batch)


from collections import Counter

vocabulary = Counter()
for X_batch, y_batch in datasets["train"].batch(32).map(preprocess):
    for review in X_batch:
        vocabulary.update(list(review.numpy()))

vocab_size = 10000
truncated_vocabulary = [
    word for word, count in vocabulary.most_common()[:vocab_size]]

words = tf.constant(truncated_vocabulary)
word_ids = tf.range(len(truncated_vocabulary), dtype=tf.int64)
vocab_init = tf.lookup.KeyValueTensorInitializer(words, word_ids)
num_oov_buckets = 1000
table = tf.lookup.StaticVocabularyTable(vocab_init, num_oov_buckets)



def encode_words(X_batch, y_batch):
    return table.lookup(X_batch), y_batch

train_set = datasets["train"].repeat().batch(32).map(preprocess)
train_set = train_set.map(encode_words).prefetch(1)




embed_size = 128
model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size + num_oov_buckets, embed_size,
                           mask_zero=True, # not shown in the book
                           input_shape=[None]),
    keras.layers.GRU(128, return_sequences=True),
    keras.layers.GRU(128),
    keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
history = model.fit(train_set, steps_per_epoch=train_size // 32, epochs=5)

해당 코드 실행 시 두 가지 이슈가 있습니다.
Colab이 아닌 로컬 환경에서
Tensorflow 2.4.1 버전 기준, Numpy 1.20.1 버전과 함께 사용할 시
Cannot convert a symbolic Tensor (gru/strided_slice:0) to a numpy array.
메시지와 함께 코드 실행이 불가능해집니다. Numpy의 버전을 1.19.2로 다운그레이드하면 해결됩니다.
pip install numpy==1.19.2
로 해결 가능합니다.

두 번째로 메모리 이슈가 있어,

[_Derived_]RecvAsync is cancelled.
	 [[{{node Adam/Adam/update/AssignSubVariableOp/_41}}]]
	 [[gradient_tape/sequential/embedding/embedding_lookup/Reshape/_38]] [Op:__inference_train_function_115627]

메시지와 함께 실행되지 않습니다.

이는

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_memory_growth(gpus[0], True)
  except RuntimeError as e:
    
    
    print(e)

코드를 추가함으로서 메모리 할당량 증가를 허용하면 실행이 가능합니다.

Handsonml2의 다른 코드에서 발견할 수 없던 현상이라 이슈를 작성합니다.
더 큰 코드에서도 메모리 할당량 증가 없이 돌아갔으며,
해당 메모리 할당량 증가 코드를 넣고 돌려도 gpu 메모리 사용량은 무척 적습니다.
이 또한 의문입니다.
이런 별도의 작업 없이 오류가 없도록 실행이 가능한 방법을 찾으려 합니다.

The text was updated successfully, but these errors were encountered:

rickiepark · 2021-04-22T12:15:18Z

좋은 정보 공유해 주셔서 감사드립니다. 저도 시간나는 대로 문제를 재현해 보겠습니다. :)

Barleysack · 2021-04-22T12:18:52Z

감사합니다.
Ageron의 본 레포지토리에도 동일한 이슈를 올려두었습니다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CH16, Sentiment Analysis Numpy version issue/Memory leak? #7

CH16, Sentiment Analysis Numpy version issue/Memory leak? #7

Barleysack commented Apr 22, 2021 •

edited

Loading

rickiepark commented Apr 22, 2021

Barleysack commented Apr 22, 2021

CH16, Sentiment Analysis Numpy version issue/Memory leak? #7

CH16, Sentiment Analysis Numpy version issue/Memory leak? #7

Comments

Barleysack commented Apr 22, 2021 • edited Loading

rickiepark commented Apr 22, 2021

Barleysack commented Apr 22, 2021

Barleysack commented Apr 22, 2021 •

edited

Loading