[TensorFlow] LSTM을 활용한 긍부정 판별기

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

pbj0812의 코딩 일기

[TensorFlow] LSTM을 활용한 긍부정 판별기 본문

인공지능 & 머신러닝/TensorFlow

[TensorFlow] LSTM을 활용한 긍부정 판별기

pbj0812 2020. 8. 11. 01:38

0. 목표

- 텐서플로를 활용한 긍부정 판별기 제작

1. 실습

1) library 호출

import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import LSTM
import tensorflow as tf
import matplotlib.pyplot as plt

2) 데이터 셋 다운로드

- imdb dataset : 영화 리뷰 및 긍 부정 결과 포함

- num_words : 가장 빈번한 단어

(x_train_all, y_train_all), (x_test, y_test) = imdb.load_data(num_words=100)

3) 데이터 확인

print(x_train_all)
print(y_train_all)
print(x_test)
print(y_test)

- 결과(x에는 리스트 형태의 숫자가 나열되어 있고, y에는 0, 1의 긍부정 결과가 존재)

4) 데이터 확인2

- 결과 : 25000 4세트

print(len(x_train_all))
print(len(y_train_all))
print(len(x_test))
print(len(y_test))

5) 필요 없는 데이터 제거

- 2 이하의 값들은 필요 없는 값을 나타냄.(제외 하지 않아도 학습 결과에는 크게 영향을 미치지 않으나 사전을 통해 변환할 때 문제가 발생)

for i in range(len(x_train_all)):
  x_train_all[i] = [w for w in x_train_all[i] if w > 2]

6) 사전 다운로드

word2index = imdb.get_word_index()

7) 단어 매핑

- 5)에서 확인을 하였듯이 실제로 있는 값들은 3부터이므로 매핑을 위해 3씩 차감한 이후 매핑

index2word = {word2index[w] : w for w in word2index}
for w in x_train_all[0]:
  print(index2word[w - 3], end = ' ')

- 결과

this film was just story really the they and you just there is an and the from the as so i the there was a with this film the the film were great it was just so much that i the film as as it was for and would it to to and the was really at the it was so and you what they if you at a film it have been good and this was also to the that the of and they were just are out of the i because the that them all up are a for the film but are and be for what they have don't you the story was so because it was and was all that was with all

8) 학습 / 검증 데이터 샘플링

- np.random.seed를 통해 난수를 생성

- permutation 함수를 이용하여 인덱스를 섞음

- 2만개까지 학습용, 이후 5천개는 검증용

np.random.seed(50)

random_index = np.random.permutation(25000)

x_train = x_train_all[random_index[:20000]]
y_train = y_train_all[random_index[:20000]]
x_val = x_train_all[random_index[20000:]]
y_val = y_train_all[random_index[20000:]]

9) 패딩을 통한 리스트 길이 균일화

maxlen = 100

x_train_seq = sequence.pad_sequences(x_train, maxlen = maxlen)
x_val_seq = sequence.pad_sequences(x_val, maxlen = maxlen)

10) 데이터 확인

print(x_train_seq[0])

- 결과(데이터가 짧으면 앞에 0으로 채움)

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8 83  6 22  8  5 23
  4 17 36  4  7  4  4  7  6 11 42 14  9 35  4 22 46  8 30 44  4  7  5 82
  4  7  4 17  6  4 40  6 10 10  9  4 18 14 22 12 26  6  7 73 15 12 14 22
 18 72 21 11  6 75 26  8 55 52 48 24 39  6 10 10  6 22 31  8 48 25 92  4
 53 42  4 18]

11) 모델 생성

model_lstm = tf.keras.Sequential( )

model_lstm.add(tf.keras.layers.Embedding(1000, 32))
model_lstm.add(LSTM(8))
model_lstm.add(tf.keras.layers.Dense(1, activation = 'sigmoid'))

12) 생성 모델 확인

model_lstm.summary()

- 결과

13) 모델 컴파일

model_lstm.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

14) 모델 학습

- 정확도 : 74.58 %

history = model_lstm.fit(x_train_seq, y_train, epochs = 10, batch_size = 32, validation_data = (x_val_seq, y_val))

15) 결과 그리기

plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.legend()

- 결과

2. 참고

- 딥러닝 입문(박해선, 이지스 퍼블리싱)

- 난수 발생과 카운팅

- IMDB dataset(kaggle)

- 케라스 정리

저작자표시 비영리 동일조건

'인공지능 & 머신러닝 > TensorFlow' 카테고리의 다른 글

[Tensorflow] tensorflow.js 로 모델 학습하기(1차 방정식) (0)	2021.01.06
[Tensorflow] node.js & tensorflow.js 를 통한 이미지 판별기 튜토리얼 (0)	2021.01.04
[TensorFlow] sin 그래프 예측하기 (0)	2020.07.25
[TensorFlow] MNIST 예제를 활용한 내가 쓴 숫자 맞추기 (0)	2020.07.13
[TensorFlow] Colab 세팅하기 (0)	2020.07.04