Convolutional Neural Networks

Convolutional Neural Network (or CNN) is an architecture that processes image inputs using filter-based optimization techniques. In computer vision, it is considered revolutionary and outperforms traditional methods in image and video processing.

Although there were many early works in neural networks for image processing, Yann LeCun introduced the first CNN between 1989 and 1998. This pioneering model was the first image processing system compatible with backpropagation, and it was applied to handwritten digit recognition.

Architecture

CNNs employ several types of filters to capture input data. The composition of these filters with layers defines the architecture. This composition and ordering have a huge impact on the model's performance, referred to as the "architecture of the CNN".

Receptive Field refers to the structural relationship between a layer and the previous layer.
State refers to input data that has passed through layers or filters.
Padding is a method to adjust dimensionality across layers by adding or removing pixels from the input.
Striding is a parameter to adjust the step size of convolutional and pooling layers.

The architecture consists of many layers and filters, each with its own function that is carefully chosen to complement the overall architecture. Convolutional Layer is a key layer of CNNs that input multi-dimensional to produce output called “feature map ( or activation map)”.

(I * K)(i, j) = \sum_{m=0}^{M-1}\sum_{n=0}^{N-1}{I(i+m,j+n)\cdot K(m,n)}

CNNs use two mechanisms to extract features from input: filters and kernels. Both can have fixed or learnable parameters and serve distinct functions in CNNs:

A kernel refers to an algorithm that performs convolutional operations through element-wise multiplication or summation.
A filter refers to an algorithm that processes and enhances output to help the model identify patterns.

Various Filters

Edge detection filter: These filters are used to detect and emphasize boundaries in images. Each filter has its unique characteristics:

Sobel filter: Detects horizontal and vertical edges, and is relatively resistant to noise.
Scharr filter: An improved version of Sobel, enabling more accurate edge detection.
Laplacian filter: Can detect edges in all directions simultaneously but is sensitive to noise.

Sharpening filter: These filters are used to increase image sharpness:

High-pass filter: Emphasizes high-frequency components to sharpen image details.
Unsharp Mask: Increases sharpness by subtracting a blurred version from the original image.

Blur filter: These filters are used to soften images or remove noise:

Gaussian filter: Creates natural blur effects using weights based on normal distribution.
Mean filter: Creates a simple blur effect using the average value of surrounding pixels.
Median filter: Effective for noise removal by using the median value of surrounding pixels.

Pooling Layer is another key layer that reduces computational cost and memory usage. It acts as a noise filter while simplifying the input—the resulting output is called a "pooled state". It can be calculated by various methods, with "max pooling" and "average pooling" being among the most popular choices.

\text{MaxPool} = \max^{M-1}_{m=0}\max^{N-1}_{n=0}{I (i + m, j + n)}\newline \text{AvgPool} = \frac{1}{M \cdot N}\sum_{m=0}^{M-1}\sum^{N-1}_{n=0}{I(i + m, j + n)}

It is important to understand the differences between pooling and convolution. A convolutional layer aims to extract or exaggerate patterns, textures, and edges of input, while a pooling layer's main purpose is to reduce dimensional complexity.

Fully Connected Layer is a layer that produces the final output. This layer connects fully with the previous layer by converting multi-dimensional input into a one-dimensional vector. It learns the patterns extracted through the earlier layers and predicts output as the desired value.

PreviousModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks NextWelcome! I'm Myeonghwan (for recruiter)

Last updated 3 months ago