来源:https://hyunhp.tistory.com/448
1. RNN cell 与 RNN 直观图示
RNN ----> Recurrent Neural Network
You can think of the recurrent neural network as the repeated use of a single cell,the computations for a single time step.
2. 输入的维度Dimensions of input x
2.1 Input with number of units
➢ For a single time step of a single input example, is a one-dimensional input vector
➢ Using language as an example, a language with a 5000-word vocabulary could be one-hot encoded into a vector that has units. so could have the shape (5000,)
➢ The notation is used here to denote the number of units in a single time step of a single training example
2.2 Time Steps of size
➢ A recurrent neural network has multiple time steps, which you'll be index with t.
➢ In the lessons, you saw a single training example consisting of multiple time steps . In this notebook, will denote the number of timesteps in the longest sequence.
2.3 Batches of size m
➢ Let's say we have mini-batches, each with 20 training examples
➢ To benefit from vectorization, you'll stack 20 columns of examples
➢ For example, the tensor has the shape (5000,20,10)
➢ You'll use m to denote the number of training examples
➢ So, the shape of a mini-batch is
2.4 3D Tensor of shape
➢ The 3-dimensional tensor x of shape represents the input x that is fed into the RNN
2.5 Take a 2D slice for each time step:
➢ At each time step, you'll use a mini-batch of training examples (not just a single example)
➢ So, for each time step t, you'll use a 2D slice of shape
➢ This 2D slice is referred to as . The variable name in the code is xt.
3. 隐藏状态的维度 hidden state a
the activation that is passed to the RNN from one time step to another is called a "hidden state"
3.1 Dimensions of hidden state a
➢ Similar to the input tensor x, the hidden state for a single training example is a vector of length
➢ If you include a mini-batch or m training examples, the shape of a mini-batch is
➢ When you include the time step dimension, the shape of the hidden state is
➢ You'll loop through the time steps with index t, and work with 2 2D slice of the 3D tensor
➢ This 2D slice is referred to as
➢ In the code, the variable names used are either a_prev or a_next, depending on the function being implemented
➢ The shape of this 2D slice is
4. 输出的维度Dimensions of prediction
➢ Similar to the inputs and hidden states, is a 3D tensor of shape
■ : number of units in the vector representing the prediction
■ m : number of examples in a mini-batch
■ : number of time steps in the prediction
➢ For a single similar time step t, a 2D slice has shape
➢ In the code, the varriable names are:
● y_pred :
● yt_pred :
5. 构建RNN
➢ Here is how you can implement an RNN:
Steps:
● Implement the calculations needed for one time step of the RNN.
● Implement a loop over time steps in order to process all the inputs, one at a time
➢ 关于 RNN Cell
You can think of the recurrent neural network as the repeated use of a single cell. First, you'll implement the computations for a single time step.
➢ RNN cell versus RNN_cell_forward:
● Note that an RNN cell outputs the hidden state
■ RNN cell is shown in the figure as the inner box with solid lines
● The function that you'll implement, rnn_cell_forward, also calculates the prediction
■ RNN_cell_forward is shown in the figure as the outer ox with dashed lines
➢ The following figure describes the operations for a single time step of an RNN cell:
代码如下:
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
"""
【代码注释】
Implements a single forward step of the RNN-cell as described in Figure (2)
Arguments:
xt -- your input data at timestep "t", numpy array of shape (n_x, m).
a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a_next -- next hidden state, of shape (n_a, m)
yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
"""
# Retrieve parameters from "parameters"
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
### START CODE HERE ### (≈2 lines)
# compute next activation state using the formula given above
a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)
# compute output of the current cell using the formula given above
yt_pred = softmax(np.dot(Wya, a_next) + by)
### END CODE HERE ###
# store values you need for backward propagation in cache
cache = (a_next, a_prev, xt, parameters)
return a_next, yt_pred, cache
执行上述代码
def rnn_cell_forward_tests(rnn_cell_forward):
np.random.seed(1)
xt_tmp = np.random.randn(3, 10)
a_prev_tmp = np.random.randn(5, 10)
parameters_tmp = {}
parameters_tmp['Waa'] = np.random.randn(5, 5)
parameters_tmp['Wax'] = np.random.randn(5, 3)
parameters_tmp['Wya'] = np.random.randn(2, 5)
parameters_tmp['ba'] = np.random.randn(5, 1)
parameters_tmp['by'] = np.random.randn(2, 1)
a_next_tmp, yt_pred_tmp, cache_tmp = rnn_cell_forward(xt_tmp, a_prev_tmp, parameters_tmp)
print("a_next[4] = \n", a_next_tmp[4])
print("a_next.shape = \n", a_next_tmp.shape)
print("yt_pred[1] =\n", yt_pred_tmp[1])
print("yt_pred.shape = \n", yt_pred_tmp.shape)
# UNIT TESTS
rnn_cell_forward_tests(rnn_cell_forward)
6. RNN前向传播的过程 RNN Forward Pass
➢ A recurrent neural network (RNN) is repetition of the RNN cell that you've just built.
● If your input sequence of data is 10 time steps long, then you will re-use the RNN cell 10 times
➢ Each cell takes two inputs at each time step:
● : The hidden state from the previous cell
● : The current time step's input data
➢ It has two outputs at each time step:
● A hidden state
● A prediction
➢ The weights biases are resued each time step
● They are maintained between calls to rnn_cell_forward in the 'parameters' dictionary
? 上面代码里面没有提
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: rnn_forward
def rnn_forward(x, a0, parameters):
""" Implement the forward propagation of the recurrent neural network described in Figure (3). Arguments:
x -- Input data for every time-step, of shape (n_x, m, T_x).
a0 -- Initial hidden state, of shape (n_a, m)
parameters -- python dictionary containing:
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
caches -- tuple of values needed for the backward pass, contains (list of caches, x)
"""
# Initialize "caches" which will contain the list of all caches
caches = []
# Retrieve dimensions from shapes of x and parameters["Wya"]
n_x, m, T_x = x.shape
n_y,n_a = parameters["Wya"].shape
### START CODE HERE ###
# initialize "a" and "y_pred" with zeros (≈2 lines)
a = np.zeros((n_a, m, T_x))
y_pred = np.zeros((n_y, m, T_x))
# Initialize a_next (≈1 line)
a_next = a0
# loop over all time-steps
for t in range(T_x):
# Update next hidden state, compute the prediction, get the cache (≈1 line)
a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t] ,a_next, parameters)
# Save the value of the new "next" hidden state in a (≈1 line)
a[:,:,t] = a_next
# Save the value of the prediction in y (≈1 line)
y_pred[:,:,t] = yt_pred
# Append "cache" to "caches" (≈1 line)
caches.append(cache)
### END CODE HERE
### # store values needed for backward propagation in cache
caches = (caches, x)
return a, y_pred, caches
执行 上述代码
def rnn_forward_test(rnn_forward) :
np.random.seed(1)x_tmp = np.random.randn(3, 10, 4)
a0_tmp = np.random.randn(5, 10)
parameters_tmp = {}
parameters_tmp['Waa'] = np.random.randn(5, 5)
parameters_tmp['Wax'] = np.random.randn(5, 3)
parameters_tmp['Wya'] = np.random.randn(2, 5)
parameters_tmp['ba'] = np.random.randn(5, 1)
parameters_tmp['by'] = np.random.randn(2, 1)
a_tmp, y_pred_tmp, caches_tmp = rnn_forward(x_tmp, a0_tmp, parameters_tmp)
print("a[4][1] = \n", a_tmp[4][1])
print("a.shape = \n", a_tmp.shape)
print("y_pred[1][3] =\n", y_pred_tmp[1][3])
print("y_pred.shape = \n", y_pred_tmp.shape)
print("caches[1][1][3] =\n", caches_tmp[1][1][3])
print("len(caches) = \n", len(caches_tmp))
#UNIT TEST
rnn_forward_test(rnn_forward)
7. 小结
You've successfully built the forward propagation of a recurrent network from scratch.
➢ Situations when this RNN will peform better:
● This will work well enough for some applications, but it suffers from vanishing gradients.
● The RNN works best when each output can be estimated using "local" context.
● "Local" context refers to information that is close to the prediction's time step t.
● More formally, local context refers to inputs and predictions where is close to
➢ What you should remember:
● The recurrent neural network, or RNN , is essentially the repeated use of a single cell.
● A basic RNN reads inputs one at a time, and remembers information through the hidden layer activations(hidden states) that are passed from one step to the next
■ The timestep dimension determines how many times to re-use the RNN cell
● Each cell takes into two inputs at each time step:
■ The hidden state from the previous cell
■ The current time step's input data
● Each cell has two outputs at each time step:
■ A hidden state
■ A prediction