逻辑回归的梯度下降公式

逻辑回归的梯度下降公式:

θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) /theta_{j}:=/theta_{j}-/alpha /frac{1}{m} /sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right) x_{j}^{(i)} θj​:=θj​−αm1i=1m​(hθ​(x(i))−y(i))xj(i)

其中:
h θ ( x ( i ) ) = g ( θ T x ( i ) ) = 1 1 + e − θ T x ( i ) h_{/theta}(x^{(i)})=g/left(/theta^T x^{(i)}/right)=/frac{1}{1+e^{-/theta^{T} x^{(i)}}} hθ​(x(i))=g(θTx(i))=1+e−θTx(i)1

向量化后的公式为:

θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) /theta:=/theta-/frac{/alpha}{m} X^{T}(g(X /theta)-/vec{y}) θ:=θ−mα​XT(g(Xθ)−y ​)

其中:

y ⃗ = ( y ( 1 ) y ( 2 ) ⋮ y ( m ) )         θ = ( θ 0 θ 1 ⋮ θ n )        X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] m × ( n + 1 ) /vec{y}=/left(/begin{array}{c} y^{(1)} // y^{(2)} // /vdots // y^{(m)} /end{array}/right)~~~~~~~/theta=/left(/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right)~~~~~~X=/left[/begin{array}{cccc} x_{0}^{(1)} & x_{1}^{(1)} & /cdots & x_{n}^{(1)} // x_{0}^{(2)} & x_{1}^{(2)} & /cdots & x_{n}^{(2)} // /vdots & & &/vdots// x_{0}^{(m)} & x_{1}^{(m)} & /cdots & x_{n}^{(m)} /end{array}/right]_{m /times(n+1)} y ​=y(1)y(2)y(m)​       θ=θ0θ1θn​      X=x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)m×(n+1)

X θ = [ θ 0 x 0 ( 1 ) + θ 1 x 1 ( 1 ) + θ 2 x 2 ( 1 ) + ⋯ + θ n x n ( 1 ) θ 0 x 0 ( 2 ) + θ 1 x 2 ( 2 ) + θ 2 x 2 ( 2 ) + ⋯ + θ n x n ( 2 ) ⋯ θ 0 x 0 ( m ) + θ 1 x 1 ( m ) + θ 2 x 2 ( m ) + ⋯ + θ n x n ( m ) ]                     g ( X θ ) = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋯ h θ ( x ( m ) ) ] X /theta=/left[/begin{array}{c} /theta_{0} x_{0}^{(1)}+/theta_{1} x_{1}^{(1)}+/theta_{2} x_{2}^{(1)}+/cdots+/theta_{n} x_{n}{ }^{(1)} // /theta_{0} x_{0}^{(2)}+/theta_{1} x_{2}^{(2)}+/theta_{2} x_{2}^{(2)}+/cdots+/theta_{n} x_{n}^{(2)} // /cdots // /theta_{0} x_{0}^{(m)}+/theta_{1} x_{1}^{(m)}+/theta_{2} x_{2}^{(m)}+/cdots+/theta_{n} x_{n}^{(m)} /end{array}/right]~~~~~~~~~~~~~~~~~~~ g(X /theta)=/left[/begin{array}{c} h_{/theta}/left(x^{(1)}/right) // h_{/theta}/left(x^{(2)}/right) // /cdots// h_/theta/left(x^{(m)}/right) /end{array}/right] Xθ=θ0​x0(1)​+θ1​x1(1)​+θ2​x2(1)​+⋯+θn​xn(1)θ0​x0(2)​+θ1​x2(2)​+θ2​x2(2)​+⋯+θn​xn(2)θ0​x0(m)​+θ1​x1(m)​+θ2​x2(m)​+⋯+θn​xn(m)​                   g(Xθ)=hθ​(x(1))hθ​(x(2))hθ​(x(m))

详细向量化过程

∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) = [ h θ ( x ( 1 ) ) − y ( 1 ) ] x j ( 1 ) + [ h θ ( x ( 2 ) ) − y ( 2 ) ] x j ( 2 ) + ⋯ + [ h θ ( x ( m ) ) − y ( m ) ] x j ( m ) = ( x j ( 1 ) , x j ( 2 ) , ⋯   , x j ( m ) ) ⋅ ( h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ) = ( x j ( 1 ) , x j ( 2 ) , ⋯   , x j ( m ) ) ⋅ [ ( h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ) − ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) ] = x j ⋅ [ g ( X θ ) − y ⃗ ] /begin{aligned} &/sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right) x_{j}^{(i)} //// =&{/left[h_{/theta}/left(x^{(1)}/right)-y^{(1)}/right]x_{j}^{(1)}+/left[h_{/theta}/left(x^{(2)}/right)-y^{(2)}/right] x_{j}^{(2)}} +/cdots+/left[h_{/theta}/left(x^{(m)}/right)-y^{(m)}/right] x_{j}^{(m)} //// = &/left(x_{j}^{(1)}, x_{j}^{(2)}, /cdots, x_{j}^{(m)}/right) /cdot/left(/begin{array}{c} h_{/theta}/left(x^{(1)}/right)-y^{(1)} // h_{/theta}/left(x^{(2)}/right)-y^{(2)} // /vdots // h_{/theta}/left(x^{(m)}/right)-y^{(m)} /end{array}/right) //// =& /left(x_{j}^{(1)}, x_{j}^{(2)}, /cdots, x_{j}^{(m)}/right)/cdot/left[/left(/begin{array}{c} h_{/theta}/left(x^{(1)}/right) // h_{/theta}/left(x^{(2)}/right) // /vdots // h_{/theta}/left(x^{(m)}/right) /end{array}/right)-/left(/begin{array}{c} y^{(1)} // y^{(2)} // /vdots // y^{(m)} /end{array}/right)/right] //// =& x_{j} /cdot[g(X /theta)-/vec{y}] /end{aligned} ====i=1m​(hθ​(x(i))−y(i))xj(i)[hθ​(x(1))−y(1)]xj(1)​+[hθ​(x(2))−y(2)]xj(2)​+⋯+[hθ​(x(m))−y(m)]xj(m)(xj(1)​,xj(2)​,⋯,xj(m)​)⋅hθ​(x(1))−y(1)hθ​(x(2))−y(2)hθ​(x(m))−y(m)(xj(1)​,xj(2)​,⋯,xj(m)​)⋅hθ​(x(1))hθ​(x(2))hθ​(x(m))​−y(1)y(2)y(m)xj​⋅[g(Xθ)−y ​]

则:
θ j : = θ j − α m x j [ g ( X θ ) − y ⃗ ] /theta_{j}:=/theta_{j}-/frac{/alpha}{m}x_{j}[g(X /theta)-/vec{y}] θj​:=θj​−mα​xj​[g(Xθ)−y ​]

[ θ 0 θ 1 ⋮ θ n ] : = [ θ 0 θ 1 ⋮ θ n ] − α m [ x 0 x 1 ⋮ x n ] [ g ( X θ ) − y ⃗ ] /left[/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right]:=/left[/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right]-/frac{/alpha}{m}/left[/begin{array}{c} x_{0} // x_{1} // /vdots // x_{n} /end{array}/right]/left[g/left(X/theta/right)-/vec{y}/right] θ0θ1θn​:=θ0θ1θn​−mαx0x1xn​[g(Xθ)−y ​]

最终得:

θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) /theta:=/theta-/frac{/alpha}{m} X^{T}(g(X /theta)-/vec{y}) θ:=θ−mα​XT(g(Xθ)−y ​)