🎣 🥗 ㊗️ Synopsis sur l'apprentissage automatique. Analyse mathématique. Descente en pente 😀 👨🏿‍🍳 🕺🏿

Rappeler l'analyse mathématique

Continuité de fonction et dérivé

Soit

$E \ subseteq \ mathbb {R}$ ,

$a$ Est le point limite de l'ensemble

$E$ (c.-à-d.

$a \ in E, \ forall \ varepsilon> 0 \ space \ space | (a - \ varepsilon, a + \ varepsilon) \ cap E | = \ infty$ ),

à

$f \ colon E \ à \ mathbb {R}$ .

Définition 1 (limite de fonction de Cauchy):

Fonction

à

$f \ colon E \ à \ mathbb {R}$ engagé à

$A$ à

$x$ cherchant à

$a$ si

$\ forall \ varepsilon> 0 \ espace \ espace \ existe \ delta> 0 \ espace \ espace \ forall x \ dans E \ espace \ espace (0 <| x- a | <\ delta \ Rightarrow | f (x) - A | <\ varepsilon).$

Désignation:

$\ lim \ limits_ {E \ ni x \ to a} f (x) = A$ .

Définition 2:

Intervalle $ab$ appelé ensemble $] a, b [\ espace: = \ {x \ in \ mathbb {R} | a <x <b \}$ ;
Intervalle de points $x \ in \ mathbb {R}$ est appelé le voisinage de ce point.
Un voisinage perforé d'un point est un voisinage d'un point dont ce point lui-même est exclu.

Désignation:

$V (x)$ ou $U (x)$ - voisinage d'un point $x$ ;
$\ overset {\ circ} {U} (x)$ - voisinage perforé d'un point $x$ ;
$U_E (x): = E \ cap U (x), \\ \ overset {\ circ} {U} _E (x): = E \ cap \ overset {\ circ} {U} (x)$

Définition 3 (limite de fonction à travers les quartiers):

$\ lim \ limits_ {E \ ni x \ to a} f (x) = A: = \ forall V_R (A) \ space \ exist \ overset {\ circ} {U} _E (a) \ space \ space ( f (\ overset {\ circ} {U} _E (a)) \ subset V_R (A)).$

Les définitions 1 et 3 sont équivalentes.

Définition 4 (continuité d'une fonction en un point):

$f \ colon E \ à \ mathbb {R}$ en continu $a \ in E: =$
$= \ pour tout V (f (a)) \ espace \ espace \ existe U_E (a) \ espace \ espace (f (U_E (a)) \ sous-ensemble V (f (a)));$
$f \ colon E \ à \ mathbb {R}$ en continu $a \ in E: =$
$\ forall \ varepsilon> 0 \ espace \ espace \ existe \ delta> 0 \ espace \ espace \ forall x \ dans E \ espace \ espace (| xa | <\ delta \ Rightarrow | f (x) -f (a) | <\ varepsilon).$

Les définitions 3 et 4 montrent que
(

$f \ colon E \ à \ mathbb {R}$ en continu

$a \ in E$ où

$a$ - point limite

$E$ )

$\ Leftrightarrow$

$\ Leftrightarrow (\ lim \ limits_ {E \ ni x \ to a} f (x) = f (a)).$

Définition 5:

Fonction

$f \ colon E \ à \ mathbb {R}$ appelé continu sur le plateau

$E$ si elle est continue à chaque point de l'ensemble

$E$ .

Définition 6:

Fonction $f \ colon E \ à \ mathbb {R}$ défini sur le plateau $E \ sous-ensemble \ mathbb {R}$ est appelé différenciable au point $a \ in E$ limitant pour l'ensemble $E$ s'il existe un tel linéaire par rapport à l'incrément $x-a$ fonction d'argument $A \ cdot (x-a)$ [fonction différentielle $f$ au point $a$ ] cet incrément $f (x) -f (a)$ les fonctions $f$ représenté comme
$f (x) -f (a) = A \ cdot (x-a) + o (x-a) \ quad pour \ espace x \ en a, \ espace x \ en E.$
Valeur
$f '(a) = \ lim \ limits_ {E \ ni x \ to a} \ frac {f (x) -f (a)} {x-a}$

appelée fonction dérivée $f$ au point $a$ .

Aussi

$f '(x) = \ lim _ {\ substack {h \ to 0 \\ x + h, x \ in E}} \ frac {f (x + h) -f (x)} {h}.$

Définition 7:

Point $x_0 \ in E \ subset \ mathbb {R}$ est appelé le point local maximum (minimum) , et la valeur de la fonction est appelée le point local maximum (minimum) de la fonction $f \ colon E \ à \ mathbb {R}$ si $\ existe U_E (x_0)$ :
$\ forall x \ in U_E (x_0) \ space \ space f (x) \ leq f (x_0) (respectivement, f (x) \ geq f (x_0)).$
Les points de maximum et minimum locaux sont appelés points d' extrémum local , et les valeurs de la fonction en eux sont appelées extrema locaux de la fonction .
Point $x_0 \ in E$ fonction extremum $f \ colon E \ à \ mathbb {R}$ appelé un point extremum interne si $x_0$ est le point limite comme pour l'ensemble $E _- = \ {x \ in E | x <x_0 \}$ , et pour l'ensemble $E _ + = \ {x \ in E | x> x_0 \}$ .

Lemme 1 (Fermat):

Si la fonction

$f \ colon E \ à \ mathbb {R}$ différenciable au point d'extrémum interne

$x_0 \ in E$ , alors sa dérivée à ce point est nulle:

$f '(x_0) = 0$ .

Proposition 1 (théorème de Roll):
Si la fonction

$f \ colon [a, b] \ to \ mathbb {R}$ continu sur un segment

$[a, b]$ différenciable dans l'intervalle

$] a, b [$ et

$f (a) = f (b)$ alors il y a un point

$\ xi \ in] a, b [$ tel que

$f '(\ xi) = 0$ .

Théorème 1 (théorème d'incrément fini de Lagrange):

Si la fonction

$f \ colon [a, b] \ to \ mathbb {R}$ continu sur un segment

$[a, b]$ et différenciable dans l'intervalle

$] a, b [$ alors il y a un point

$\ xi \ in] a, b [$ tel que

$f (b) -f (a) = f '(\ xi) (b-a).$

Corollaire 1 (signe de monotonie d'une fonction):
Si à tout moment d'un intervalle la dérivée de la fonction est non négative (positive), alors la fonction ne diminue pas (augmente) dans cet intervalle.

Corollaire 2 (critère de constance de la fonction):
Continu sur une coupe

$[a, b]$ une fonction n'est pas constante si et seulement si sa dérivée est nulle en tout point de l'intervalle

$[a, b]$ (ou au moins l'intervalle

$] a, b [$ )

Dérivée partielle d'une fonction de nombreuses variables

À travers

$\ mathbb {R} ^ m$ désignons l'ensemble:

$\ mathbb {R} ^ m = \ underbrace {\ mathbb {R} \ times \ mathbb {R} \ times \ cdots \ times \ mathbb {R}} _ m = \ {(\ omega_1, \ omega_2, ... , \ omega_m), \ space \ omega_i \ in \ mathbb {R} \ space \ forall i \ in \ overline {1, m} \}.$

Définition 8:

Fonction

$f \ colon E \ à \ mathbb {R}$ défini sur le plateau

$E \ sous-ensemble \ mathbb {R} ^ m$ est appelé différenciable au point

$x \ in E$ limitant pour l'ensemble

$E$ si

$f (x + h) -f (x) = L (x) h + \ alpha (x; h), \ qquad (1)$

où

$L (x) \ colon \ mathbb {R} ^ m \ to \ mathbb {R}$ - linéaire par rapport à

$h$ fonction [fonction différentielle

$f$ au point

$x$ (référence

$df (x)$ ou

$f '(x)$ )], et

$\ alpha (x; h) = o (h)$ à

$h \ à 0, x + h \ dans E$ .

La relation (1) peut être réécrite comme suit:

$f (x + h) -f (x) = f '(x) h + \ alpha (x; h)$

$\ bigtriangleup f (x; h) = df (x) h + \ alpha (x; h).$

Si nous allons à l'enregistrement des coordonnées du point

$x = (x ^ 1, ..., x ^ m)$ , vecteur

$h = (h ^ 1, ..., h ^ m)$ et fonctions linéaires

$L (x) h = a_1 (x) h ^ 1 + ... + a_m (x) h ^ m$ , alors l'égalité (1) ressemble à ceci

$f (x ^ 1 + h ^ 1, ..., x ^ m + h ^ m) -f (x ^ 1, ..., x ^ m) = \\ = a_1 (x) h ^ 1 + ... + a_m (x) h ^ m + o (h) \ quad pour \ espace \ espace h \ à 0, \ qquad (2)$

où

$a_1 (x), ..., a_m (x)$ - associé au point

$x$ nombres réels. Vous devez trouver ces numéros.

Nous dénotons

$h_i = h ^ ie_i = 0 \ cdot e_1 + ... + 0 \ cdot e_ {i-1} + h ^ i \ cdot e_i + 0 \ cdot e_ {i + 1} + ... + 0 \ cdot e_m,$

où

$\ {e_1, ..., e_m \}$ - base en

$\ mathbb {R} ^ m$ .

À

$h = h_i$ de (2) on obtient

$f (x ^ 1, ..., x ^ {i-1}, x ^ i + h ^ i, x ^ {i + 1}, ..., x ^ m) -f (x ^ 1, ..., x ^ i, ..., x ^ m) = \\ = a_i (x) h ^ i + o (h ^ i) \ quad pour \ espace \ espace h ^ i \ à 0. \ qquad (3)$

De (3) on obtient

$a_i (x) = \ lim_ {h_i \ to 0} \ frac {f (x ^ 1, ..., x ^ {i-1}, x ^ i + h ^ i, x ^ {i + 1} , .., x ^ m) -f (x ^ 1, ..., x ^ i, ..., x ^ m)} {h ^ i}. \ qquad (4)$

Définition 9:
La limite (4) est appelée la dérivée partielle de la fonction

$f (x)$ au point

$x = (x ^ 1, ..., x ^ m)$ par variable

$x ^ i$ . Il est désigné:

$\ frac {\ partial f} {\ partial x ^ i} (x), \ quad \ partial_if (x), \ quad f '_ {x ^ i} (x).$

Exemple 1:

$f (u, v) = u ^ 3 + v ^ 2 \ sin u, \\ \ partial_1f (u, v) = \ frac {\ partial f} {\ partial u} (u, v) = 3u ^ 2 + v ^ 2 \ cos u, \\ \ partial_2 f (u, v) = \ frac {\ partial f} {\ partial v} (u, v) = 2v \ sin u.$

Descente en pente

Soit

$f \ colon \ mathbb {R} ^ n \ to \ mathbb {R}$ où

$\ mathbb {R} ^ n = \ underbrace {\ mathbb {R} \ times \ mathbb {R} \ times \ cdots \ times \ mathbb {R}} _ n = \ {(\ theta_1, \ theta_2, ... , \ theta_n), \ space \ theta_i \ in \ mathbb {R} \ space \ forall i \ in \ overline {1, n} \}$ .

Définition 10:

Fonction dégradé

$f \ colon \ mathbb {R} ^ n \ to \ mathbb {R}$ appelé vecteur,

$i$ dont l'élément est égal à

$\ frac {\ partial f} {\ partial \ theta_i}$ :

$\ bigtriangledown _ {\ theta} f = \ left (\ begin {array} {c} \ frac {\ partial f} {\ partial \ theta_1} \\\ frac {\ partial f} {\ partial \ theta_2} \\ \ vdots \\\ frac {\ partial f} {\ partial \ theta_n} \ end {array} \ right), \ quad \ theta = (\ theta_1, \ theta_2, ..., \ theta_n).$

Le gradient est la direction dans laquelle la fonction augmente le plus rapidement. Cela signifie que la direction dans laquelle il diminue le plus rapidement est la direction opposée au gradient, c'est-à-dire

$- \ bigtriangledown _ {\ theta} f$ .

Le but de la méthode de descente en gradient est de rechercher le point extrême (minimum) d'une fonction.

Désigner par

$\ theta ^ {(t)}$ vecteur de paramètre de fonction à l'étape

$t$ . Vecteur de mise à jour des paramètres à l'étape

$t$ :

$u ^ {(t)} = - \ eta \ bigtriangledown _ {\ theta} f (\ theta ^ {(t-1)}), \ quad \ theta ^ {(t)} = \ theta ^ {(t- 1)} + u ^ {(t)}.$

Dans la formule ci-dessus, le paramètre

$\ eta$ C'est la vitesse d'apprentissage qui contrôle la taille du pas que nous prenons dans le sens de la pente du gradient. En particulier, deux problèmes opposés peuvent survenir:

si les marches sont trop petites, l'entraînement sera trop long et la probabilité de rester coincé dans un petit minimum local non réussi le long de la route augmente (la première image dans l'image ci-dessous);
s'ils sont trop grands, vous pouvez sans cesse sauter le minimum souhaité d'avant en arrière, mais jamais atteindre le point le plus bas (la troisième image dans l'image ci-dessous).

Un exemple:
Prenons l'exemple de la méthode de descente de gradient dans le cas le plus simple (

$n = 1$ ) C’est

$f \ colon \ mathbb {R} \ to \ mathbb {R}$ .
Soit

$f (x) = x ^ 2, \ quad \ theta ^ {(0)} = 3, \ quad \ eta = 1$ . Ensuite:

$\ frac {\ partial f} {\ partial x} (x) = 2x \ quad \ Rightarrow \ quad \ bigtriangledown f_ \ theta (x) = 2x; \\ \ theta ^ {(1)} = \ theta ^ {(0)} - 1 \ cdot f_ \ theta (\ theta ^ {(0)}) = 3 - 6 = -3; \\ \ theta ^ {(2)} = \ theta ^ {(1)} - 1 \ cdot f_ \ theta (\ theta ^ {(1)}) = - 3 + 6 = 3 = \ theta ^ {(0 )}.$

Dans le cas où

$\ eta = 1$ , la situation est comme dans la troisième image de l'image ci-dessus. Nous sautons constamment sur le point extremum.
Soit

$\ eta = 0,8$ . Ensuite:

$\ theta ^ {(1)} = \ theta ^ {(0)} - 0,8 \ fois f_ \ theta (\ theta ^ {(0)}) = 3 - 0,8 \ times6 = 3 - 4,8 = -1,8; \\ \ theta ^ {(2)} = \ theta ^ {(1)} - 0,8 \ fois f_ \ theta (\ theta ^ {(1)}) = - 1,8 + 0,8 \ times3,6 = -1,8 + 2,88 = 1,08; \\ \ theta ^ {(3)} = \ theta ^ {(2)} - 0,8 \ times f_ \ theta (\ theta ^ {(2)}) = 1,08 - 0,8 \ times2.16 = 1,08 - 1,728 = - 0,648; \\ \ theta ^ {(4)} = \ theta ^ {(3)} - 0,8 \ times f_ \ theta (\ theta ^ {(3)}) = - 0,648 + 0,8 \ times1.296 = -0,648 + 1,0368 = 0,3888; \\ \ theta ^ {(5)} = \ theta ^ {(4)} - 0,8 \ times f_ \ theta (\ theta ^ {(4)}) = 0,3888 - 0,8 \ times0,7776 = 0,3888 - 0,62208 = -0,23328; \\ \ theta ^ {(6)} = \ theta ^ {(5)} - 0,8 \ fois f_ \ theta (\ theta ^ {(5)}) = - 0,23328 + 0,8 \ times0,46656 = -0,23328 + 0,373248 = \\ = 0,139968.$

On voit que nous approchons itérativement du point d'extrémum.
Soit

$\ eta = 0,5$ . Ensuite:

$\ theta ^ {(1)} = \ theta ^ {(0)} - 0,5 \ fois f_ \ theta (\ theta ^ {(0)}) = 3 - 0,5 \ times6 = 3 - 3 = 0; \\ \ theta ^ {(2)} = \ theta ^ {(1)} - 0,5 \ fois f_ \ theta (\ theta ^ {(1)}) = 0 - 0,5 \ times0 = 0.$

Le point extremum a été trouvé en 1 étape.

Liste de la littérature utilisée:

«Analyse mathématique. Partie 1 ", V.A. Zorich, Moscou, 1997;
«Apprentissage profond. Immersion dans le monde des réseaux de neurones », S. Nikulenko, A. Kadurin, E. Arkhangelskaya, PETER, 2018.

Synopsis sur l'apprentissage automatique. Analyse mathématique. Descente en pente

Rappeler l'analyse mathématique

Continuité de fonction et dérivé

Dérivée partielle d'une fonction de nombreuses variables

Descente en pente

Liste de la littérature utilisée:

More articles: