Artificial Neural Networks: Working, Perceptrons, Activation Functions, and Multilayer Structures
Introduction
Artificial Neural Networks (ANNs) are computational models inspired by biological neural networks. They form the foundation of deep learning and are used for pattern recognition, classification, and prediction tasks.
This article describes the working of ANNs, with particular focus on perceptrons, activation functions, and multilayer structures.
What is an Artificial Neural Network?
An ANN consists of interconnected nodes called neurons, organized in layers. Each neuron receives inputs, processes them, and produces an output. The network learns by adjusting connection strengths (weights) based on training data.
ANNs mimic the human brain's ability to learn from examples and generalize to new situations.
Basic Components
- Neurons: Processing units that compute weighted sums of inputs
- Weights: Connection strengths between neurons
- Biases: Threshold values for neuron activation
- Layers: Input, hidden, and output layers
- Activation Functions: Non-linear transformations
Perceptrons: The Building Block
A perceptron is the simplest form of a neural network unit, introduced by Frank Rosenblatt in 1957. It represents a single neuron that makes binary decisions.
Structure of a Perceptron
A perceptron has:
- Multiple input connections (x₁, x₂, ..., xₙ)
- Associated weights (w₁, w₂, ..., wₙ)
- A bias term (b)
- An activation function
- A single output (y)
Working Principle
The perceptron computes a weighted sum of inputs:
Weighted Sum (z) = (x₁ × w₁) + (x₂ × w₂) + ... + (xₙ × wₙ) + b
Then applies an activation function to produce the output:
Output (y) = activation_function(z)
Learning in Perceptrons
Perceptrons learn using the perceptron learning rule:
- Initialize weights and bias randomly
- For each training example:
- Compute output
- Calculate error (target - output)
- Update weights: wᵢ = wᵢ + η × error × xᵢ
- Update bias: b = b + η × error
- Repeat until convergence
Where η is the learning rate.
Limitations of Single Perceptrons
- Can only solve linearly separable problems
- Cannot handle XOR function
- No hidden layers for complex patterns
Activation Functions
Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. They determine whether a neuron should be activated based on the weighted sum.
Types of Activation Functions
1. Step Function
The simplest activation function:
σ(z) = 1 if z ≥ 0
0 if z < 0
Used in basic perceptrons, but not differentiable.
2. Sigmoid Function
Smooth, S-shaped curve:
σ(z) = 1 / (1 + e^(-z))
- Output range: (0, 1)
- Differentiable
- Used for binary classification
3. Hyperbolic Tangent (Tanh)
Similar to sigmoid but centered at zero:
σ(z) = (e^z - e^(-z)) / (e^z + e^(-z))
- Output range: (-1, 1)
- Zero-centered
- Better for hidden layers
4. Rectified Linear Unit (ReLU)
Most popular in modern networks:
σ(z) = max(0, z)
- Output range: [0, ∞)
- Computationally efficient
- Helps with vanishing gradient problem
5. Softmax Function
Used in output layers for multi-class classification:
σ(z)_i = e^(z_i) / Σ e^(z_j)
- Converts logits to probabilities
- Sum of outputs equals 1
Choosing Activation Functions
| Layer Type | Common Functions | Reasons |
|---|---|---|
| Input | None/Identity | Preserve input values |
| Hidden | ReLU, Tanh, Sigmoid | Non-linearity, gradient flow |
| Output | Sigmoid (binary), Softmax (multi-class), Linear (regression) | Appropriate output format |
Multilayer Structures
Multilayer perceptrons (MLPs) overcome single perceptron limitations by adding hidden layers between input and output.
Network Architecture
A typical MLP has:
- Input Layer: Receives raw data
- Hidden Layers: Extract features and patterns
- Output Layer: Produces final predictions
Forward Propagation
Data flows from input to output:
- Input layer receives data
- Each neuron computes weighted sum + bias
- Applies activation function
- Passes output to next layer
- Process repeats through all layers
Backpropagation Algorithm
The key to training MLPs:
- Forward Pass: Compute predictions
- Calculate Loss: Compare predictions with targets
- Backward Pass: Compute gradients using chain rule
- Update Weights: Use gradient descent
Mathematical Foundation
For a neuron in layer l:
Z^(l) = W^(l) × A^(l-1) + b^(l)
A^(l) = σ(Z^(l))
Where:
- Z^(l): Pre-activation values
- A^(l): Post-activation values
- W^(l): Weight matrix
- b^(l): Bias vector
- σ: Activation function
Training Process
- Initialize weights randomly
- For each epoch:
- Forward propagation
- Compute loss
- Backpropagation to compute gradients
- Update parameters using optimizer (e.g., SGD, Adam)
- Repeat until convergence
Advantages of Multilayer Networks
- Can learn complex, non-linear relationships
- Universal approximation capability
- Feature learning and representation
- Hierarchical feature extraction
Challenges
- Vanishing/exploding gradients
- Overfitting
- Computational complexity
- Need for large datasets
Applications and Examples
Binary Classification (Single Output Neuron)
Example: Email spam detection
- Input: Email features (word counts, sender reputation)
- Hidden layers: Learn patterns
- Output: Probability of spam (sigmoid)
Multi-class Classification (Multiple Output Neurons)
Example: Handwritten digit recognition
- Input: 28×28 pixel image (784 features)
- Hidden layers: Extract edges, shapes
- Output: 10 neurons with softmax (digits 0-9)
Regression (Linear Output)
Example: House price prediction
- Input: House features (area, location, rooms)
- Hidden layers: Learn complex relationships
- Output: Predicted price (linear activation)
Key Concepts and Terminology
- Feedforward Networks: Data flows only forward
- Fully Connected Layers: Every neuron connects to all in next layer
- Loss Functions: Measures prediction error (MSE, Cross-entropy)
- Optimizers: Update weights (SGD, Adam, RMSprop)
- Epochs: Complete passes through training data
- Batch Size: Number of samples processed together
- Learning Rate: Step size for weight updates
Conclusion
Artificial Neural Networks work by processing information through interconnected neurons. Perceptrons form the basic unit, activation functions provide non-linearity, and multilayer structures enable complex learning.
Understanding these components is crucial for grasping modern deep learning architectures. ANNs have revolutionized fields like computer vision, natural language processing, and autonomous systems.
For more learning resources, visit https://anacgpa.netlify.app/tools
Summary Points
- Perceptrons: Single neurons with weighted inputs and threshold activation
- Activation Functions: Non-linear transformations (ReLU, Sigmoid, Tanh, Softmax)
- Multilayer Networks: Multiple layers for complex pattern learning
- Training: Forward propagation + backpropagation with gradient descent
- Applications: Classification, regression, feature learning
This covers the fundamental working principles of ANNs as required for the examination.
Topics
Found this article helpful? Share it with others!