Neural Networks • Neural Nets

Neural networks are at the core of modern deep learning, designed to recognize complex patterns in data, make predictions, and solve tasks like classification and regression. Inspired by the human brain, neural networks consist of layers of interconnected nodes (neurons) that process data through a series of transformations. Let’s explore the inner mechanics of a neural network, covering advanced concepts like forward propagation, backpropagation, optimization techniques, and more.

1. Structure of a Neural Network

A neural network consists of the following primary types of layers:

Input Layer: Receives the raw input data.
Hidden Layers: One or more intermediate layers that perform increasingly complex transformations.
Output Layer: Provides the network’s final prediction or classification.

Each layer contains neurons, where each neuron in a layer is typically connected to every neuron in the next layer in a fully connected (dense layer). However, other architectures like convolutional layers or recurrent layers handle specific data types more efficiently.

Mathematically, a neural network’s output $y$ can be formulated as:

y = f(\mathbf{X}; \mathbf{W}, \mathbf{b})

where:

$\mathbf{X}$ is the input vector.
$\mathbf{W}$ is the set of weights across all layers.
$\mathbf{b}$ is the set of biases across all layers.
$f$ represents the function computed by the network, typically a composition of linear transformations and non-linear activation functions.

2. Forward Propagation

Forward propagation is the process by which input data flows through the network, layer by layer, to generate predictions. Each neuron computes a weighted sum of its inputs, adds a bias term, and applies an activation function to introduce non-linearity.

Linear Transformation

The output $z_i^{(l)}$ of a neuron $i$ in layer $l$ is computed by:

z_i^{(l)} = \sum_{j} w_{ij}^{(l)} a_j^{(l-1)} + b_i^{(l)}

where:

$w_{ij}^{(l)}$ is the weight connecting the $j$ -th neuron in layer $l-1$ to neuron $i$ in layer $l$ ,
$a_j^{(l-1)}$ is the activation of neuron $j$ in the previous layer,
$b_i^{(l)}$ is the bias for neuron $i$ .

Activation Functions

The activation function introduces non-linearity, enabling the network to model complex patterns. Common activation functions include:

Sigmoid:
$\sigma(z) = \frac{1}{1 + e^{-z}}$
Often used in binary classification tasks.
ReLU (Rectified Linear Unit):
$\text{ReLU}(z) = \max(0, z)$
Efficient and mitigates the vanishing gradient problem.
Leaky ReLU:
$\text{Leaky ReLU}(z) = \begin{cases} z, & z > 0 \\ 0.01z, & z \leq 0 \end{cases}$
Addresses “dying ReLU” by allowing a small gradient for $z \leq 0$ .
Tanh:
$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$
Centered around zero, suitable for layers with outputs ranging from -1 to 1.

Forward Propagation Example

Below is pseudocode for a flexible implementation of forward propagation with ReLU activation:

import numpy as np

def relu(z):
    return np.maximum(0, z)

def forward_propagation(X, layers, activations):
    cache = {}
    a = X
    for i, (weights, biases) in enumerate(layers):
        z = np.dot(a, weights) + biases
        a = activations[i](z)
        cache[f"z{i+1}"] = z
        cache[f"a{i+1}"] = a
    return a, cache

3. Loss Functions

The loss function measures the difference between predictions and actual targets. Common loss functions include:

Binary Cross-Entropy:
$L = - \frac{1}{N} \sum_{i=1}^N \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right)$
Categorical Cross-Entropy:
$L = - \frac{1}{N} \sum_{i=1}^N \sum_{c=1}^C y_{ic} \log(\hat{y}_{ic})$
Mean Squared Error (MSE):
$L = \frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2$

4. Backward Propagation

Backward propagation computes gradients of the loss function with respect to network weights and biases using the chain rule.

Chain Rule and Gradients

For a weight $w$ :

\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}

Backward Propagation Code

def backward_propagation(X, y, weights, biases, cache, learning_rate=0.01):
    a = cache["a1"]
    dz = a - y
    dw = np.dot(X.T, dz) / X.shape[0]
    db = np.sum(dz) / X.shape[0]
    weights -= learning_rate * dw
    biases -= learning_rate * db
    return weights, biases

5. Training the Neural Network

Training involves repeated forward propagation, loss computation, backward propagation, and weight updates.

Regularization Techniques

L1 Regularization:
$L_1 = \lambda \sum_{i=1}^{n} |w_i|$
L2 Regularization (Weight Decay):
$L_2 = \frac{\lambda}{2} \sum_{i=1}^{n} w_i^2$
Dropout: Randomly deactivates neurons during training.

Training Loop Example

def train(X, y, layers, epochs=100, learning_rate=0.001):
    for epoch in range(epochs):
        # Forward propagation
        a, cache = forward_propagation(X, layers, activations=[relu]*len(layers))
        loss = -np.mean(y * np.log(a) + (1 - y) * np.log(1 - a))
        # Backward propagation
        for i, (weights, biases) in enumerate(layers):
            weights, biases = backward_propagation(X, y, weights, biases, cache, learning_rate)
        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Loss: {loss}")
    return layers

Conclusion

Neural networks are a cornerstone of modern AI. Through forward propagation and backward propagation, neural networks learn to make accurate predictions. Regularization and optimization techniques enhance their performance on complex tasks. As you progress, explore advanced architectures like CNNs and RNNs for specific data types.