Artificial Neural Network into Deep Neural Network, Birth of AI Generation

Artificial Neural Network (ANN) is a machine learning model that mimics the biological system of the human brain. It consists of neurons that work like synapses to pass signals, which are represented as real numbers. Each neuron receives signals from other neurons and processes them into new inputs for subsequent neurons using its non-linear activation function and weights.

Neuron: The Basic Building Block of ANNs

A neuron receives multiple inputs $x$ and produces a result $z$ . These inputs can be either raw data or outputs from previous neurons. The output is calculated by taking the weighted sum of inputs multiplied by weights $w$ and adding a bias. This value $z$ is then typically passed through an activation function $\sigma$ to generate the final signal $a$ .

\text{Neuron}(x) = \sigma(z) = \sigma{( \Sigma^n_{i=1}{w_ix_i + b} )}

$x = [x_1, x_2, ... , x_n]$
$w = [w_1, w_2, ..., w_n]$
$z = w \cdot x + b = \Sigma^{n}_{i = 1}w_i \cdot x_i + (b \quad\text{if bias needed})$

Neural Network Layers: Organizing Neurons for Processing

An ANN can be viewed as a graph with a layered architecture of neurons. This architecture becomes more complex in modern ANNs, such as deep learning models. Neurons in each layer connect to neurons in the next layer, enabling the model to capture complex and abstract features.

A layer that receives signals from external sources (input data or training data) is called an Input Layer.
A layer that receives signals from the input layer or other hidden layers is called a Hidden Layer.
A layer that produces the final signal is called an Output Layer.

The depth of neural networks refers to the number of hidden layers in the model. Research has shown that early layers detect basic features like edges and colors, while deeper layers identify more complex elements like parts and shapes of images. This hierarchical ability to learn different features at various depths is why ANNs perform so well. However, simply increasing depth doesn't always improve performance.

Hyperparameters: Training Control Variables

Hyperparameters are values that control the training of neural networks. They define the model architecture and remain constant during the training algorithm, typically being set before training begins. They heavily influence model performance, and one of the main goals in ANN development is finding optimized hyperparameters.

Learning rate determines the step size for updating weights. A high learning rate may be more cost-effective but leads to lower accuracy, while a low learning rate may be more costly but achieves higher accuracy.
Batch Size refers to the number of samples processed in each training iteration. It affects training speed, memory usage, and model regularization. Large batch sizes provide more stable gradient adjustments but require more memory, while small batch sizes use less memory but result in less stable gradient adjustments.
Depth refers to the number of hidden layers in the model. Greater depth allows the model to learn more complex patterns; however, it also increases the possibility of overfitting. The appropriate depth is determined by both the complexity of the problem and the volume of available data.

Key Characteristics of Hyperparameters

These parameters maintain fixed values throughout the learning process and determine how the model learns from data. For example, typical hyperparameters include learning rate, batch size, number of hidden layers and neurons in each layer, and types of activation functions. These influence various aspects such as model complexity, learning speed, and generalization ability, ultimately determining the final model's performance.

Hyperparameters are closely interconnected and can have interdependent relationships.
Parameter values are mostly derived empirically through the learning process.

Backpropagation

Backpropagation is an algorithm that enables efficient computation in deep neural networks. It calculates gradients of the loss function with respect to weights by working backward through the network. This key technique makes it possible to train large, complex neural architectures.

In modern AI/ML, "forward propagation" refers to the process where input data flows through all layers of the network to produce the output, while "backpropagation" represents a method to simplify gradient calculations using the chain rule:

\text{chain rule: } \nabla{(c \circ p)} = c'(p(x_p))\cdot p'(x_p)

Some experts describe backpropagation as a novel way to draw a computation graph; however, its main benefit comes from its recursive approach across all neural network architectures.

g(x) := f^{L}(w^{L}f^{L - 1}(W^{L-1 }... f^{1}(w^{1}x)...))

$x$ is the input., $y$ the output.
$C$ is the loss/cost function.
$L$ is the number of layers.
$f^l$ is the activation function of layer $l$ .
$a^l_j$ is the activation of the $j$ -th node in layer $l$ .
$W^l$ is the weight between layers $l - 1$ and $l$ — $w^l_{jk}$ is the weight between the $k$ -th node in layer $l$ and the $j$ -th node in layer $l-1$ .

Loss and Cost in Modern ANN

Cost is a statistical measurement of a model's prediction accuracy. It evaluates the model's performance by measuring the average distance between observations and outputs.

While loss and cost may appear similar, there are important differences. Cost is a comprehensive metric that represents the average error across the entire training dataset, used to evaluate the overall performance of the model. In contrast, loss measures the immediate error for individual data points or mini-batches. This is directly used to adjust the model's parameters during the learning process.

Understanding the difference between these two concepts is crucial for model training and evaluation:

The loss function is used to update model weights in each training iteration and provides immediate feedback.
The cost function monitors model performance trends over a longer time frame and helps determine whether overfitting or underfitting is occurring.
Effective model development requires a balanced approach that considers both metrics.

From Simple to Deep: The Evolution of Neural Networks

Deep Neural Network (DNN) refers to ANNs with multiple hidden layers. Although the overall architecture is similar to a traditional ANN, the use of deep layers creates significant differences in performance.

PreviousUnderstanding Neural Network Foundations: Perceptron, ADALINE and MLP NextChallenges in Training Deep Neural Network and the Latest Solutions

Last updated 3 months ago