*Sequence Data :
순서가 바뀔 수 있는(의미가 있는) data
예) 주식 Data : 6일차의 Data를 보유하고 있을 때, 전 날 Data와 전전 날 Data가 영향을 준다(각각의 단계 마다 시퀀스별 관계를 담는다).
=> Recurrent Data(Data가 해당 Data에 영향을 미친다)
*RNN 활용분야:
- Language Modeling
- Speech Recognition
- Machine Translation
- Conversation Modeling / Question Answering
- Image /Video Captioning
- Image /Music /Dance Generation
* LSTM(Long Short-Term Memory Units) :
RNNs의 변형으로 90년대 중반에 처음으로 등장
-> Back propagation 하는 과정에서 오차의 값이 더 잘 유지되는데, 결과적으로 1000단계가 넘게 거슬러 올라갈 수 있음
*LSTM 학습단계 :
1) Forget Gate Layer(무엇을 잊을지, 완전히 제거할지)에 의해 결정됨
2) Input gate layer인 시그모이드 층이 어떤 값들을 갱신할지 결정
3) 이전 cell 상태 Ct-1을 새 cell 상태 Ct로 갱신하는 단계
4) 무엇을 출력할지 결정하는 단계
*RNN 구현 소스(teach hello):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | import tensorflow as tf import numpy as np tf.set_random_seed(777) # reproducibility idx2char = ['h', 'i', 'e', 'l', 'o'] # Teach hello: hihell -> ihello x_data = [[0, 1, 0, 2, 3, 3]] # hihell x_one_hot = [[[1, 0, 0, 0, 0], # h 0 [0, 1, 0, 0, 0], # i 1 [1, 0, 0, 0, 0], # h 0 [0, 0, 1, 0, 0], # e 2 [0, 0, 0, 1, 0], # l 3 [0, 0, 0, 1, 0]]] # l 3 y_data = [[1, 0, 2, 3, 3, 4]] # ihello input_dim = 5 # one-hot size hidden_size = 5 # output from the LSTM. 5 to directly predict one-hot batch_size = 1 # one sentence sequence_length = 6 # |ihello| == 6 X = tf.placeholder(tf.float32, [None, sequence_length, hidden_size]) # X one-hot Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label #cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size, state_is_tuple=True) cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_size) initial_state = cell.zero_state(batch_size, tf.float32) outputs, _states = tf.nn.dynamic_rnn( cell, X, initial_state=initial_state, dtype=tf.float32) weights = tf.ones([batch_size, sequence_length]) sequence_loss = tf.contrib.seq2seq.sequence_loss( logits=outputs, targets=Y, weights=weights) loss = tf.reduce_mean(sequence_loss) train = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss) prediction = tf.argmax(outputs, axis=2) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(2000): l, _ = sess.run([loss, train], feed_dict={X: x_one_hot, Y: y_data}) result = sess.run(prediction, feed_dict={X: x_one_hot}) print(i, "loss:", l, "prediction: ", result, "true Y: ", y_data) # print char using dic result_str = [idx2char[c] for c in np.squeeze(result)] print("\tPrediction str: ", ''.join(result_str)) ''' 0 loss: 1.55474 prediction: [[3 3 3 3 4 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: lllloo 1 loss: 1.55081 prediction: [[3 3 3 3 4 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: lllloo 2 loss: 1.54704 prediction: [[3 3 3 3 4 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: lllloo 3 loss: 1.54342 prediction: [[3 3 3 3 4 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: lllloo ... 1998 loss: 0.75305 prediction: [[1 0 2 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: ihello 1999 loss: 0.752973 prediction: [[1 0 2 3 3 4]] true Y: [[1, 0, 2, 3, 3, 4]] Prediction str: ihello ''' | cs |
*RNN - LSTM 구현 소스 (아리아리아리랑)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #-*- coding: utf-8 -*- import tensorflow as tf import numpy as np tf.set_random_seed(777) # reproducibility idx2char = ['아', '리', '랑'] # 아리아리아리 -> 리아리아리랑 x_data = [[0, 1, 0, 1, 0, 1]] # 아리아리아리랑 x_one_hot = [[[1, 0, 0], # 아 0 [0, 1, 0], # 리 1 [1, 0, 0], # 아 0 [0, 1, 0], # 리 1 [1, 0, 0], # 아 0 [0, 1, 0]]] # 리 1 y_data = [[1, 0, 1, 0, 1, 2]] # 리아리아리랑 input_dim = 3 # one-hot size hidden_size = 3 # output from the LSTM. batch_size = 1 # one sentence sequence_length = 6 # |리아리아리랑| == 6 X = tf.placeholder(tf.float32, [None, sequence_length, hidden_size]) # X one-hot Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size, state_is_tuple=True) #cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_size) initial_state = cell.zero_state(batch_size, tf.float32) outputs, _states = tf.nn.dynamic_rnn( cell, X, initial_state=initial_state, dtype=tf.float32) weights = tf.ones([batch_size, sequence_length]) sequence_loss = tf.contrib.seq2seq.sequence_loss( logits=outputs, targets=Y, weights=weights) loss = tf.reduce_mean(sequence_loss) train = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss) prediction = tf.argmax(outputs, axis=2) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(2000): l, _ = sess.run([loss, train], feed_dict={X: x_one_hot, Y: y_data}) result = sess.run(prediction, feed_dict={X: x_one_hot}) print(i, "loss:", l, "prediction: ", result, "true Y: ", y_data) # print char using dic result_str = [idx2char[c] for c in np.squeeze(result)] print("\tPrediction str: ", ''.join(result_str)) | cs |
*문장을 Char별로 분석해서 숫자 부여하기
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import tensorflow as tf import numpy as np from tensorflow.contrib import rnn tf.set_random_seed(777) # reproducibility sentence = ("A recurrent neural network is a class of artificial neural network " "where connections between units form a directed cycle. " "This allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks," "RNNs can use their internal memory to process arbitrary sequences of inputs.") print(sentence) char_set = list(set(sentence)) # Set에 넣어서 중복 제거 후 List에 넣기 print(char_set) char_dic = {w: i for i, w in enumerate(char_set)} print(char_dic) | cs |
*Full Sentence 분석하기 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | import tensorflow as tf import numpy as np from tensorflow.contrib import rnn tf.set_random_seed(777) # reproducibility sentence = ("A recurrent neural network is a class of artificial neural network " "where connections between units form a directed cycle. " "This allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks," "RNNs can use their internal memory to process arbitrary sequences of inputs.") print(sentence) char_set = list(set(sentence)) # Set에 넣어서 중복 제거 후 List에 넣기 char_dic = {w: i for i, w in enumerate(char_set)} print(char_dic) data_dim = len(char_set) hidden_size = len(char_set) num_classes = len(char_set) seq_length = 10 # Any arbitrary number dataX = [] dataY = [] print(len(sentence)) for i in range(0, len(sentence) - seq_length): x_str = sentence[i:i + seq_length] y_str = sentence[i + 1: i + seq_length + 1] print(i, x_str, '->', y_str) x = [char_dic[c] for c in x_str] # x str to index y = [char_dic[c] for c in y_str] # y str to index dataX.append(x) dataY.append(y) batch_size = len(dataX) print('batch_size:', batch_size) X = tf.placeholder(tf.int32, [None, seq_length]) Y = tf.placeholder(tf.int32, [None, seq_length]) # One-hot encoding X_one_hot = tf.one_hot(X, num_classes) # one hot을 알아서 처리해주는 함수 print(X_one_hot) # check out the shape # Make a lstm cell with hidden_size (each unit output vector size) cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True) # LSTM으로 cell = rnn.MultiRNNCell([cell] * 2, state_is_tuple=True) # MultiRNN 2개 계층으로 쌓기 # outputs: unfolding size x hidden size, state = hidden size outputs, _states = tf.nn.dynamic_rnn(cell, X_one_hot, dtype=tf.float32) # dynamic rnn 실행을 통해 output과 state 받기 print('output', outputs) # (optional) softmax layer X_for_softmax = tf.reshape(outputs, [-1, hidden_size]) softmax_w = tf.get_variable("softmax_w", [hidden_size, num_classes]) softmax_b = tf.get_variable("softmax_b", [num_classes]) outputs = tf.matmul(X_for_softmax, softmax_w) + softmax_b # reshape out for sequence_loss outputs = tf.reshape(outputs, [batch_size, seq_length, num_classes]) # All weights are 1 (equal weights) weights = tf.ones([batch_size, seq_length]) sequence_loss = tf.contrib.seq2seq.sequence_loss( logits=outputs, targets=Y, weights=weights) mean_loss = tf.reduce_mean(sequence_loss) train_op = tf.train.AdamOptimizer(learning_rate=0.1).minimize(mean_loss) sess = tf.Session() sess.run(tf.global_variables_initializer()) for i in range(500): _, l, results = sess.run( [train_op, mean_loss, outputs], feed_dict={X: dataX, Y: dataY}) for j, result in enumerate(results): index = np.argmax(result, axis=1) print(i, j, ''.join([char_set[t] for t in index]), l) # Let's print the last char of each result to check it works results = sess.run(outputs, feed_dict={X: dataX}) # result=170 # Full Sentense 뽑기 for j, result in enumerate(results): index = np.argmax(result, axis=1) if j is 0: # print all for the first result to make a sentence print(''.join([char_set[t] for t in index]), end='') else: print(char_set[index[-1]], end='') | cs |
'Python 활용 딥러닝' 카테고리의 다른 글
CNN으로 꽃 이미지 예측하기 (0) | 2018.12.20 |
---|---|
CNN + Convolution Neural Network (0) | 2018.12.20 |
Neural Nets for MNIST, Xavier Initialization, Dropout 적용 소스 (0) | 2018.12.19 |
Deep Learning 학습방법(Layer 구성, Backpropagation, Activation function ReLU) (0) | 2018.12.19 |
Neural Nets & Deep learning, Neural Nets for XOR (0) | 2018.12.19 |