Professional Documents
Culture Documents
Deep Neural Networks
Deep Neural Networks
Arles Rodríguez
[email protected]
Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Deep Neural Networks
𝑥1
𝑥2 ^𝑦 =𝑎
𝑥3
layers =4 Layer 3
Layer 4
5
5
3
• activation in layer
1
𝑎 =𝑔[ 𝑙] ( 𝑧 [ 𝑙 ] ) , 𝑎 =𝑥 , 𝑎 = ^
[ 𝑙] [0] [ 𝐿]
𝑦
=3 • weights in layer
Forward propagation in deep network
Layer
𝑧 [𝑙]=𝑊 [ 𝑙] 𝑎[ 𝑙−1] +𝑏[𝑙]
Vectorizing…
In general, for a given layer
Layer
[𝑙]
𝑧 =𝑊 𝑎 [ 𝑙] [ 𝑙−1]
+𝑏 [𝑙] 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]
[ ]
¿ ¿ ¿ ¿
[𝑙 ]
𝑍 =¿𝑧 [𝑙](1) 𝑧 [𝑙] (2) … 𝑧[𝑙] (𝑚)
¿ ¿ ¿ ¿
Notation: , output for layer , sample
[ ]
¿ ¿ ¿ ¿
𝐴[ 𝑙]=¿𝑎[𝑙] (1) 𝑎[ 𝑙] (2) … 𝑎[𝑙](𝑚) Unit in layer
¿ ¿ ¿ ¿
Sample
Forward propagation loop
Layer 1
Layer 2
𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙] Layer 0
Layer 3
Layer 4
[5] [0 ]
𝐿=¿5 𝑛 [1]
=¿3 [2]
𝑛 =¿5 𝑛 [3 ]
=¿4 𝑛[4 ]=¿2 𝑛 =¿ 1 𝑛 =¿ 2
What are the sizes of
size? size?
If,
[ ]
¿ ¿ ¿ ¿
[1]
𝑍 =¿𝑧 [1] (1) 𝑧 [1] (2) … 𝑧[1] (𝑚)
¿ ¿ ¿ ¿
𝑍 [1]=𝑊 [1] 𝑋 +𝑏[1 ]
Is broadcasted by
) [0] , 𝑚)(𝑛[1] ,1)
For multiple samples, sizes are:(𝑛[1] ,𝑚)(𝑛[1] ,𝑛[ 0](𝑛
element-wise sum
• Dimensions of will change
: [ 𝑙 ] , 1)
𝑧 [𝑙] , 𝑎 [𝑙 ](𝑛 For layer :(𝑛 [ 0] ,𝑚)
𝑍 [𝑙] , 𝐴[𝑙](𝑛: [ 𝑙 ] , 𝑚) [𝑙 ]
𝑑𝑍 [𝑙] ,𝑑𝐴[𝑙] : (𝑛 , 𝑚)
Why deep representations?
𝑥1
^𝑦
𝑥2
simple complex
First layers,
detect edges Parts of faces
and borders Parts of faces composed together
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of
hierarchical representations. Proceedings of the 26th International Conference On Machine Learning, ICML 2009, 609–616.
https://doi.org/10.1145/1553374.1553453
When is deep better than shallow
• There are functions that you can compute with a
small L-layer deep neural network that shallower
networks that require exponentially more hidden
units to compute.
𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 … 𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8
Backward propagation:
[𝑙] [𝑙] 𝑑𝑎 [𝑙 −1] 𝑑𝑎[𝑙 ]
𝑐𝑎𝑐h𝑒:𝑧 𝑑𝑊 [𝑙 ]
𝑑𝑏
𝑑𝑊[𝑙[𝑙]]
𝑑𝑏
Forward and backward functions
𝑎[ 1] 𝑎[ 2] 𝑎[ 𝐿
𝑎[ 0] 𝑊 [1]
,𝑏[1]
𝑊 [2 ] [2]
,𝑏 … 𝑊 [ 𝐿] [ 𝐿]
,𝑏
^
𝑦
x Cache: Cache: Cache: Cache:
𝑧 [1] 𝑧 [2 ] 𝑧 [3 ]
𝐿( ^𝑦 , 𝑦 )
𝑦 1− 𝑦
𝑑𝑎[1] 𝑑𝑎 [2 ] [ 𝐿]
𝑑𝑎 =− +
𝑎 1 −𝑎
𝑑 𝑊 [1] 𝑑 𝑊 [2] 𝑑 𝑊 [3 ]
𝑑 𝑏[1] 𝑑 𝑏[2] 𝑑 𝑏[3 ]
Parameters and hyperparameters
• Parameters are: