Artificial Neural Networks
Artificial Neural Networks
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
Architecture of a typical artificial neural network
Middle Layer
Input Layer Output Layer
Analogy between biological and
artificial neural networks
The neuron as a simple computing element
Diagram of a neuron
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
Input Signals Weights Output Signals
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
The neuron uses the following transfer or activation
function:
n 1, if X
X xi wi Y
i 1 1, if X
This type of activation function is called a sign function.
Activation functions of a neuron
S t e p f u n c t io n S ig n f u n c t io n S ig m o id f u n c t io n L in e a r f u n c t io n
Y Y Y Y
+1 +1 +1 +1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
In
puts
x
1 Lin
ear Hard
w C
ombin
er L
imite
r
1 O
utp
ut
Y
w
2
x
2
Th
resh
old
Single-layer two-input perceptron
In
puts
x1 Lin
ear Hard
w C
ombin
er L
imiter
1 O
utp
ut
Y
w
2
x2
T
hresh
old
The Perceptron In
puts
x
1 Lin
ear Hard
w C
ombin
er L
imite
r
1 O
utp
ut
Y
w
2
x
2
T
hre
sho
ld
In the case of an elementary perceptron, the n-dimensional space is
divided by a hyperplane into two decision regions. The hyperplane is
defined by the linearly separable function:
n
xi wi 0
i 1
x2 x2
Class A1
1
2
1
x1
Class A2 x1
e( p) Yd ( p) Y ( p) where p = 1, 2, 3, . . .
Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),
…, xn(p) and desired output Yd (p). Calculate the
actual output at iteration p = 1
n
Y ( p ) step xi ( p ) wi ( p )
i 1
wi ( p ) xi ( p ) e( p )
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until convergence.
Example of perceptron learning: the logical operation AND
Inputs Desired Initial Actual Error Final
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 0.1 0 0 0.3 0.1
0 1 0 0.3 0.1 0 0 0.3 0.1
1 0 0 0.3 0.1 1 1 0.2 0.1
1 1 1 0.2 0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0 Inputs
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0 x1 Linear H ard
1 1 1 0.2 0.0 1 0 0.2 0.0 Com biner Limiter
w1
3 0 0 0 0.2 0.0 0 0 0.2 0.0 O
utput
0
1
1
0
0
0
0.2
0.2
0.0
0.0
0
1
0
1
0.2
0.1
0.0
0.0
Y
1 1 1 0.1 0.0 0 1 0.2 0.1 w2
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1 x2
1 0 0 0.2 0.1 1 1 0.1 0.1 Threshold
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: = 0.2; learning rate: = 0.1
Two-dimensional plots of basic logical operations
x2 x2 x2
1 1 1
x1 x1 x1
0 1 0 1 0 1
Ou t p u t Sig n a ls
Input Signals
First Second
Input hidden hidden Output
layer layer layer layer
What does the middle layer hide?
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
The back-propagation training algorithm
Step 1: Initialisation
Set all the weights and threshold levels of the network
to random numbers uniformly distributed inside a small
range:
2.4 2.4
,
Fi Fi
wij ( p ) xi ( p) j ( p )
3
w13 1
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hidden layer
The effect of the threshold applied to a neuron in the
hidden or output layer is represented by its weight, ,
connected to a fixed input equal to 1.
y3 sigmoid ( x1w13 x2 w23 3 ) 1 / 1 e (10.510.4 10.8) 0.5250
5 5 5 0 . 3 0 . 0127 0 . 3127
0
10
Sum-Squared Error
-1
10
10-2
-3
10
-4
10
0 50 100 150 200
Epoch
Final results of three-layer network learning
+1.5
1
+1.0
x1 1 3 +1.0 +0.5
+1.0
5 y5
+1.0
x2 2 +1.0
4
+1.0
+0.5
1
Decision boundaries
x2 x2 x2
x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0
1 1 1
x1 x1 x1
0 1 0 1 0 1
tan h 2a
Y bX
a
1 e
where a and b are constants.
Suitable values for a and b are:
a = 1.716 and b = 0.667
We also can accelerate training by including a
momentum term in the delta rule:
w jk ( p ) w jk ( p 1) y j ( p ) k ( p)
1
Learning Rate
0.5
-0.5
-1
0 20 40 60 80 100 120 140
Epoch
Learning with adaptive learning rate
To accelerate the convergence and yet avoid the
danger of instability, we can apply two heuristics:
Heuristic 1
If the change of the sum of squared errors has the same
algebraic sign for several consequent epochs, then the
learning rate parameter, , should be increased.
Heuristic 2
If the algebraic sign of the change of the sum of squared
errors alternates for several consequent epochs, then the
learning rate parameter, , should be decreased.
Adapting the learning rate requires some changes in the
back-propagation algorithm.
If the sum of squared errors at the current epoch
exceeds the previous value by more than a predefined
ratio (typically 1.04), the learning rate parameter is
decreased (typically by multiplying by 0.7) and new
weights and thresholds are calculated.
If the error is less than the previous one, the learning
rate is increased (typically by multiplying by 1.05).
Learning with adaptive learning rate
Training for 103 Epochs
2
10
Sum-Squared Error
101
100
-1
10
-2
10
10-3
10-4
0 10 20 30 40 50 60 70 80 90 100
Epoch
1
0.8
Learning Rate
0.6
0.4
0.2
0
0 20 40 60 80 100 120
Epoch
Learning with momentum and adaptive learning rate
Training for 85 Epochs
2
Sum-Squared Error 10
101
100
-1
10
-2
10
-3
10
10-4
0 10 20 30 40 50 60 70 80
Epoch
2.5
2
Learning Rate
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90
Epoch