GRU Recurrent Neural Networks - A Smart Way to Predict Sequences in Python5 Jan 2025 | 6 min read Cho et al. (2014) presented the Gated Recurrent Unit (GRU), a kind of recurrent neural network (RNN), as a less complex option to Long Short-Term Memory (LSTM) networks. GRU is capable of processing sequential data, including audio, text, and time-series data, just as LSTM. GRU's fundamental principle is to update the network's hidden state only on a chosen subset of time steps by use of gating methods. Information entering and leaving the network is managed by the gating mechanisms. The reset gate and the update gate are the two gating mechanisms of the GRU. The update gate defines how much of the new input should be utilised to update the hidden state, while the reset gate indicates how much of the prior hidden state should be ignored. The updated hidden state serves as the basis for calculating the GRU's output. The following formulas are used to determine a GRU's reset gate, update gate, and hidden state: To summarise, GRU networks are a kind of RNN that can efficiently represent sequential data because they employ gating techniques to selectively update the hidden state at each time step. Many natural language processing tasks, including speech recognition, machine translation, and language modelling, have demonstrated their effectiveness. Many modifications were created to address the Vanishing-Exploding gradients problem that is frequently experienced during the operation of a basic Recurrent Neural Network. The Long Short Term Memory Network (LSTM) is one of the most well-known variants. The Gated Recurrent Unit Network is a less well-known but no less effective variant (GRU). It has just three gates and doesn't keep an internal cell state as LSTM does. The hidden state of the Gated Recurrent Unit incorporates the data that is kept in the Internal Cell State of an LSTM recurrent unit. The following Gated Recurrent Unit receives this aggregate data. The following is a description of a GRU's several gates:
When depicted, the basic workflow of a Gated Recurrent Unit Network and a basic Recurrent Neural Network are similar. The primary distinction between the two is how each recurrent unit functions internally, as Gated Recurrent Unit networks are made up of gates that modulate both the current input and the previous hidden state. ![]() Working of a Gated Recurrent Unit:
![]() The above-stated working is stated as below:- ![]() Take notice that element-wise multiplication is shown by the blue circles. Vector subtraction (vector addition with negative value) is shown by the negative sign in the circle, whereas vector addition is indicated by the positive sign. For each gate, the weight matrix W has distinct weights assigned to the prior hidden state and the current input vector. A GRU network produces an output at each time step, just like a recurrent neural network, and uses gradient descent to train the network. ![]() It should be noted that, similar to the workflow, the GRU network's training procedure is diagrammatically comparable to that of a simple recurrent neural network, with the exception of how each recurrent unit functions inside. The main distinction between a Gated Recurrent Unit Network and a Long Short Term Memory Network's Back-Propagation Through Time Algorithm is in the differential chain construction. Let the actual output at each time step be denoted by y_{t} and the expected output by \overline{y}_{t}. Next, each time step's mistake is provided by: ![]() Thus, the sum of mistakes at all time steps yields the overall error. ![]() Similarly, the value \frac{\partial E}{\partial W} can be calculated as the summation of the gradients at each time step. ![]() Using the chain rule and using the fact that \overline{y}_{t} is a function of h_{t} and which indeed is a function of \overline{h}_{t}, the following expression arises:- ![]() Consequently, the following provides the total error gradient: - ![]() It should be noted that although the gradient equation's chain of \partial {h}_{t} resemble that of a simple recurrent neural network, it functions differently due to the way the derivatives of h_{t} are internally structured. How are vanishing gradients resolved by Gated Recurrent Units?The chain of derivatives beginning at \frac{\partial h_{t}}{\partial h_{t-1}} controls the value of the gradients. Remember the following formula for h_{t}: - ![]() Using the above expression, the value for \frac{\partial {h}_{t}}{\partial {h}_{t-1}} is:- ![]() Remember the following formula for \overline{h}_{t}:- ![]() Using the above expression to calculate the value of \frac{\partial \overline{h_{t}}}{\partial h_{t-1}} :- ![]() Since the sigmoid function serves as the activation function for both the update and reset gates, their values can only be 0 or 1. Case 1 (z = 1): In this case, irrespective of the value of r , the term \frac{\partial \overline{h_{t}}}{\partial h_{t-1}} is equal to z which in turn is equal to 1. Case 2A (z=0 and r=0): In this case, the term \frac{\partial \overline{h_{t}}}{\partial h_{t-1}} is equal to 0. Case 2B (z=0 and r=1): In this instance, (1-\overline{h}_{t}^{2})(W) is the value of the expression \frac{\partial \overline{h_{t}}}{\partial h_{t-1}}. The network learns to modify the weights in order to bring the term \frac{\partial \overline{h_{t}}}{\partial h_{t-1}} closer to 1. This value is determined by the trainable weight matrix. As a result, the Back-Propagation Through Time method modifies the corresponding weights to get the chain of derivatives' value as near as 1. |
Introduction: In number theory and cryptography, prime numbers are crucial. Numerous techniques have been created for the goal of identifying prime numbers, which is essential in many applications. The Lucas primality test is one such algorithm, and it provides a quick way to tell whether a...
3 min read
? ROC curves act as indispensable tools in the domain of AI, offering a graphical means to assess the presentation of binary classification models. In this aide, we'll leave on an excursion through the complicated course of plotting ROC curves utilizing two generally utilized libraries: Scikit-learn...
9 min read
Comprehending the binding and listening processes is essential for establishing network communication when utilizing sockets in Python. Let's examine the specifics. Networking is exchanging information between devices through connections. Since sockets are the key to sending messages between devices on local or wide networks and between...
13 min read
Introduction to the LZMA Compression Algorithm Within the field of data compression, the LZMA calculation stands out as a capable and commonly utilized approach for bringing down record estimate whereas holding unique substance. LZMA, or Lempel-Ziv-Markov chain Algorithm, may be a high-performance compression method recognized for...
3 min read
Introduction to 3D Arrays In basic programming languages, arrays are considered the most basic data structures, and their functions include the organization and manipulation of large collections of numerical and logical variables. Additionally, arrays that are a 1D array are lists, and 2D arrays contain data...
10 min read
Introduction: In this tutorial, we are learning about some different ways to kill a thread in Python. Generally speaking, killing threads too quickly is considered bad programming. Terminating too early may leave valuable resources open that should be properly closed. However, once in a while, you...
12 min read
? Line length has a significant impact on the readability and style of Python code. Although PEP 8, the style guide for Python code, recommends maintaining lines to a maximum of 79 characters, Python does not impose any strict line length limits. The goal of this...
4 min read
? Introduction Purpose of File Iteration in Python File iteration in Python is a key iteration that permits software engineers to explore and collaborate with documents inside a catalogue. It frames a vital piece of different applications, giving the necessary resources to productively oversee and control information put...
12 min read
Introduction Utilizing the numerical enhancement approach known as number programming (IP), some or each of the boundaries in a calculated programming task are obliged to be numbers. This limitation is fundamental for true issues like booking, allotment of assets, and planned operations, in which answers should...
4 min read
Python, a powerful and widely used programming language, provides a number of modules and functions for interacting with the operating system. One such module is the os module, which allows you to use operating system-specific capabilities such as reading or writing to the file system....
4 min read
We request you to subscribe our newsletter for upcoming updates.
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India