逻辑回归的梯度下降公式:
θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) /theta_{j}:=/theta_{j}-/alpha /frac{1}{m} /sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right) x_{j}^{(i)} θj:=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
其中:
h θ ( x ( i ) ) = g ( θ T x ( i ) ) = 1 1 + e − θ T x ( i ) h_{/theta}(x^{(i)})=g/left(/theta^T x^{(i)}/right)=/frac{1}{1+e^{-/theta^{T} x^{(i)}}} hθ(x(i))=g(θTx(i))=1+e−θTx(i)1
向量化后的公式为:
θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) /theta:=/theta-/frac{/alpha}{m} X^{T}(g(X /theta)-/vec{y}) θ:=θ−mαXT(g(Xθ)−y )
其中:
y ⃗ = ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) θ = ( θ 0 θ 1 ⋮ θ n ) X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] m × ( n + 1 ) /vec{y}=/left(/begin{array}{c} y^{(1)} // y^{(2)} // /vdots // y^{(m)} /end{array}/right)~~~~~~~/theta=/left(/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right)~~~~~~X=/left[/begin{array}{cccc} x_{0}^{(1)} & x_{1}^{(1)} & /cdots & x_{n}^{(1)} // x_{0}^{(2)} & x_{1}^{(2)} & /cdots & x_{n}^{(2)} // /vdots & & &/vdots// x_{0}^{(m)} & x_{1}^{(m)} & /cdots & x_{n}^{(m)} /end{array}/right]_{m /times(n+1)} y =⎝⎜⎜⎜⎛y(1)y(2)⋮y(m)⎠⎟⎟⎟⎞ θ=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞ X=⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)x1(m)⋯⋯⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤m×(n+1)
X θ = [ θ 0 x 0 ( 1 ) + θ 1 x 1 ( 1 ) + θ 2 x 2 ( 1 ) + ⋯ + θ n x n ( 1 ) θ 0 x 0 ( 2 ) + θ 1 x 2 ( 2 ) + θ 2 x 2 ( 2 ) + ⋯ + θ n x n ( 2 ) ⋯ θ 0 x 0 ( m ) + θ 1 x 1 ( m ) + θ 2 x 2 ( m ) + ⋯ + θ n x n ( m ) ] g ( X θ ) = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋯ h θ ( x ( m ) ) ] X /theta=/left[/begin{array}{c} /theta_{0} x_{0}^{(1)}+/theta_{1} x_{1}^{(1)}+/theta_{2} x_{2}^{(1)}+/cdots+/theta_{n} x_{n}{ }^{(1)} // /theta_{0} x_{0}^{(2)}+/theta_{1} x_{2}^{(2)}+/theta_{2} x_{2}^{(2)}+/cdots+/theta_{n} x_{n}^{(2)} // /cdots // /theta_{0} x_{0}^{(m)}+/theta_{1} x_{1}^{(m)}+/theta_{2} x_{2}^{(m)}+/cdots+/theta_{n} x_{n}^{(m)} /end{array}/right]~~~~~~~~~~~~~~~~~~~ g(X /theta)=/left[/begin{array}{c} h_{/theta}/left(x^{(1)}/right) // h_{/theta}/left(x^{(2)}/right) // /cdots// h_/theta/left(x^{(m)}/right) /end{array}/right] Xθ=⎣⎢⎢⎢⎡θ0x0(1)+θ1x1(1)+θ2x2(1)+⋯+θnxn(1)θ0x0(2)+θ1x2(2)+θ2x2(2)+⋯+θnxn(2)⋯θ0x0(m)+θ1x1(m)+θ2x2(m)+⋯+θnxn(m)⎦⎥⎥⎥⎤ g(Xθ)=⎣⎢⎢⎡hθ(x(1))hθ(x(2))⋯hθ(x(m))⎦⎥⎥⎤
详细向量化过程
∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) = [ h θ ( x ( 1 ) ) − y ( 1 ) ] x j ( 1 ) + [ h θ ( x ( 2 ) ) − y ( 2 ) ] x j ( 2 ) + ⋯ + [ h θ ( x ( m ) ) − y ( m ) ] x j ( m ) = ( x j ( 1 ) , x j ( 2 ) , ⋯ , x j ( m ) ) ⋅ ( h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ) = ( x j ( 1 ) , x j ( 2 ) , ⋯ , x j ( m ) ) ⋅ [ ( h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ) − ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) ] = x j ⋅ [ g ( X θ ) − y ⃗ ] /begin{aligned} &/sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right) x_{j}^{(i)} //// =&{/left[h_{/theta}/left(x^{(1)}/right)-y^{(1)}/right]x_{j}^{(1)}+/left[h_{/theta}/left(x^{(2)}/right)-y^{(2)}/right] x_{j}^{(2)}} +/cdots+/left[h_{/theta}/left(x^{(m)}/right)-y^{(m)}/right] x_{j}^{(m)} //// = &/left(x_{j}^{(1)}, x_{j}^{(2)}, /cdots, x_{j}^{(m)}/right) /cdot/left(/begin{array}{c} h_{/theta}/left(x^{(1)}/right)-y^{(1)} // h_{/theta}/left(x^{(2)}/right)-y^{(2)} // /vdots // h_{/theta}/left(x^{(m)}/right)-y^{(m)} /end{array}/right) //// =& /left(x_{j}^{(1)}, x_{j}^{(2)}, /cdots, x_{j}^{(m)}/right)/cdot/left[/left(/begin{array}{c} h_{/theta}/left(x^{(1)}/right) // h_{/theta}/left(x^{(2)}/right) // /vdots // h_{/theta}/left(x^{(m)}/right) /end{array}/right)-/left(/begin{array}{c} y^{(1)} // y^{(2)} // /vdots // y^{(m)} /end{array}/right)/right] //// =& x_{j} /cdot[g(X /theta)-/vec{y}] /end{aligned} ====i=1∑m(hθ(x(i))−y(i))xj(i)[hθ(x(1))−y(1)]xj(1)+[hθ(x(2))−y(2)]xj(2)+⋯+[hθ(x(m))−y(m)]xj(m)(xj(1),xj(2),⋯,xj(m))⋅⎝⎜⎜⎜⎛hθ(x(1))−y(1)hθ(x(2))−y(2)⋮hθ(x(m))−y(m)⎠⎟⎟⎟⎞(xj(1),xj(2),⋯,xj(m))⋅⎣⎢⎢⎢⎡⎝⎜⎜⎜⎛hθ(x(1))hθ(x(2))⋮hθ(x(m))⎠⎟⎟⎟⎞−⎝⎜⎜⎜⎛y(1)y(2)⋮y(m)⎠⎟⎟⎟⎞⎦⎥⎥⎥⎤xj⋅[g(Xθ)−y ]
则:
θ j : = θ j − α m x j [ g ( X θ ) − y ⃗ ] /theta_{j}:=/theta_{j}-/frac{/alpha}{m}x_{j}[g(X /theta)-/vec{y}] θj:=θj−mαxj[g(Xθ)−y ]
[ θ 0 θ 1 ⋮ θ n ] : = [ θ 0 θ 1 ⋮ θ n ] − α m [ x 0 x 1 ⋮ x n ] [ g ( X θ ) − y ⃗ ] /left[/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right]:=/left[/begin{array}{c} /theta_{0} // /theta_{1} // /vdots // /theta_{n} /end{array}/right]-/frac{/alpha}{m}/left[/begin{array}{c} x_{0} // x_{1} // /vdots // x_{n} /end{array}/right]/left[g/left(X/theta/right)-/vec{y}/right] ⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤:=⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤−mα⎣⎢⎢⎢⎡x0x1⋮xn⎦⎥⎥⎥⎤[g(Xθ)−y ]
最终得:
θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) /theta:=/theta-/frac{/alpha}{m} X^{T}(g(X /theta)-/vec{y}) θ:=θ−mαXT(g(Xθ)−y )
原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/tech/pnotes/145153.html