<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Newton&#8217;s method &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/tag/newtons-method/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Mon, 27 Apr 2020 17:30:34 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>[原创] 再谈 牛顿法/Newton&#039;s Method In Optimization</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%86%8d%e8%b0%88-%e7%89%9b%e9%a1%bf%e6%b3%95newtons-method-in-optimization/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%86%8d%e8%b0%88-%e7%89%9b%e9%a1%bf%e6%b3%95newtons-method-in-optimization/#comments</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sun, 06 Apr 2014 02:58:34 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[Newton's method]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[最优化]]></category>
		<category><![CDATA[牛顿法]]></category>
		<guid isPermaLink="false">http://www.codelast.com/?p=8052</guid>

					<description><![CDATA[<p>
<a href="http://en.wikipedia.org/wiki/Newton's_method_in_optimization" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">牛顿法</span></a>是最优化领域的经典算法，它在寻优的过程中，使用了目标函数的二阶导数信息，具体说来就是：用迭代点的梯度和二阶导数对目标函数进行二次逼近，把二次函数的极小点作为新的迭代点，不断重复此过程，直到找到最优点。<br />
<span id="more-8052"></span><br />
<span style="background-color:#00ff00;">『1』</span>历史<br />
话说，牛顿法为什么叫牛顿法？这个近乎&#8220;废话&#8221;的问题，谁又真正查过？<br />
Wiki里是这样写的：牛顿法（Newton&#39;s method）是一种近似求解方程的方法，它使用函数f(x)的泰勒级数的前面几项来寻找方程f(x)=0的根。<br />
它最初由艾萨克&#8226;牛顿在《流数法》（Method of Fluxions，1671年完成，在牛顿死后的1736年公开发表）。<br />
按我的理解，起初牛顿法和最优化没什么关系（在那个年代应该还没有最优化这门学科分支），但是在最优化研究兴起后，人们把牛顿法的思想应用在最优化领域，于是也就叫它牛顿法了。</p>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></div>
<p><span style="background-color:#00ff00;">『2』</span>原理<br />
下面我们就来推导一下牛顿法的实现。<br />
目标函数 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_50bbd36e1fd2333108437a2ca378be62.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="f(x)" /></span><script type='math/tex'>f(x)</script> 在点 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_550187f469eda08b9e5b55143f19c4ce.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{x_k}" /></span><script type='math/tex'>{x_k}</script> 的泰勒展示式前三项为：<br />
 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1e6187e61977f3367ae2cfab3166bef4.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{q_k}(x) = {q_k}({x_k} + x - {x_k}) = f({x_k}) + g_k^T(x - {x_k}) + \frac{1}{2}{(x - {x_k})^T}{G_k}(x - {x_k}) + o(x - {x_k})" /></span><script type='math/tex'>{q_k}(x) = {q_k}({x_k} + x - {x_k}) = f({x_k}) + g_k^T(x - {x_k}) + \frac{1}{2}{(x - {x_k})^T}{G_k}(x - {x_k}) + o(x - {x_k})</script> <br />
其中， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1cd5597a080292208723039cfd7bfd41.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{g_k}" /></span><script type='math/tex'>{g_k}</script> 是一阶导数（梯度）， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 是二阶导数。当然，最后一项（高阶无穷小）我们依然是不考虑的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%86%8d%e8%b0%88-%e7%89%9b%e9%a1%bf%e6%b3%95newtons-method-in-optimization/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>
<a href="http://en.wikipedia.org/wiki/Newton's_method_in_optimization" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">牛顿法</span></a>是最优化领域的经典算法，它在寻优的过程中，使用了目标函数的二阶导数信息，具体说来就是：用迭代点的梯度和二阶导数对目标函数进行二次逼近，把二次函数的极小点作为新的迭代点，不断重复此过程，直到找到最优点。<br />
<span id="more-8052"></span><br />
<span style="background-color:#00ff00;">『1』</span>历史<br />
话说，牛顿法为什么叫牛顿法？这个近乎&ldquo;废话&rdquo;的问题，谁又真正查过？<br />
Wiki里是这样写的：牛顿法（Newton&#39;s method）是一种近似求解方程的方法，它使用函数f(x)的泰勒级数的前面几项来寻找方程f(x)=0的根。<br />
它最初由艾萨克&bull;牛顿在《流数法》（Method of Fluxions，1671年完成，在牛顿死后的1736年公开发表）。<br />
按我的理解，起初牛顿法和最优化没什么关系（在那个年代应该还没有最优化这门学科分支），但是在最优化研究兴起后，人们把牛顿法的思想应用在最优化领域，于是也就叫它牛顿法了。</p>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></div>
<p><span style="background-color:#00ff00;">『2』</span>原理<br />
下面我们就来推导一下牛顿法的实现。<br />
目标函数 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_50bbd36e1fd2333108437a2ca378be62.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="f(x)" /></span><script type='math/tex'>f(x)</script> 在点 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_550187f469eda08b9e5b55143f19c4ce.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{x_k}" /></span><script type='math/tex'>{x_k}</script> 的泰勒展示式前三项为：<br />
 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1e6187e61977f3367ae2cfab3166bef4.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{q_k}(x) = {q_k}({x_k} + x - {x_k}) = f({x_k}) + g_k^T(x - {x_k}) + \frac{1}{2}{(x - {x_k})^T}{G_k}(x - {x_k}) + o(x - {x_k})" /></span><script type='math/tex'>{q_k}(x) = {q_k}({x_k} + x - {x_k}) = f({x_k}) + g_k^T(x - {x_k}) + \frac{1}{2}{(x - {x_k})^T}{G_k}(x - {x_k}) + o(x - {x_k})</script> <br />
其中， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1cd5597a080292208723039cfd7bfd41.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{g_k}" /></span><script type='math/tex'>{g_k}</script> 是一阶导数（梯度）， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 是二阶导数。当然，最后一项（高阶无穷小）我们依然是不考虑的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_9dd4e461268c8034f5c8564e155c67a6.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="x" /></span><script type='math/tex'>x</script> 为极小值点的一阶必要条件是：<br />
 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_d05c810b120192221a64b9c5b09c6137.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\nabla {q_k}(x) = 0 = {g_k} + {G_k}(x - {x_k})" /></span><script type='math/tex'>\nabla {q_k}(x) = 0 = {g_k} + {G_k}(x - {x_k})</script> <br />
由此便可得到迭代公式： <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8270e2e3b7d900af415458d0bfd0cefa.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{x_{k + 1}} = {x_k} - {G_k}^{ - 1}{g_k}" /></span><script type='math/tex'>{x_{k + 1}} = {x_k} - {G_k}^{ - 1}{g_k}</script> <br />
在最优化line search的过程中，下一个点是由前一个点在一个方向d上移动得到的，因此，在牛顿法中，人们就顺其自然地称这个方向为&ldquo;<span style="color:#0000ff;">牛顿方向</span>&rdquo;，由上面的式子可知其等于： <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_70df409d5a96a374522d67b7ec6cb9d9.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k} = - {G_k}^{ - 1}{g_k}" /></span><script type='math/tex'>{d_k} = - {G_k}^{ - 1}{g_k}</script> </p>
<p><span style="background-color:#00ff00;">『3』</span>优缺点<br />
优点：充分接近极小点时，牛顿法具有二阶收敛速度&mdash;&mdash;挺好的，不是么。<br />
缺点：<br />
①牛顿法不是整体收敛的。<br />
②每次迭代计算 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> （的逆矩阵），计算量偏大。<br />
③线性方程组 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_70df409d5a96a374522d67b7ec6cb9d9.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k} = - {G_k}^{ - 1}{g_k}" /></span><script type='math/tex'>{d_k} = - {G_k}^{ - 1}{g_k}</script> 可能是<span style="color:#0000ff;">病态</span>的，不好求解。<br />
（注：在代数方程中，有的多项式系数有微小扰动时其根变化很大，这种根对系数变化的敏感性称为不稳定性（instability），这种方程就是<span style="color:#0000ff;">病态</span>多项式方程）<br />
为了解决&ldquo;原始&rdquo;牛顿法的这些问题，人们想出了各种办法，于是就有了下面的各种改进方案，请听我一一道来。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
<span style="background-color:#00ff00;">『4』</span>牛顿法的改进１&mdash;&mdash;阻尼牛顿法<br />
前面说过了，牛顿法不是整体收敛的，在远离最优解时，牛顿方向 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_70df409d5a96a374522d67b7ec6cb9d9.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k} = - {G_k}^{ - 1}{g_k}" /></span><script type='math/tex'>{d_k} = - {G_k}^{ - 1}{g_k}</script> 不一定是<span style="color:#0000ff;">下降方向</span>&mdash;&mdash;而目标函数值&ldquo;下降&rdquo;就是最优化努力的方向，因此，人们想到了，可以在牛顿法迭代的过程中加入一点&ldquo;阻力&rdquo;：<br />
 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1a5b2864f2c2ad1481a919c59a5a793c.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{x_{k + 1}} = {x_k} + {\alpha _k}{d_k}" /></span><script type='math/tex'>{x_{k + 1}} = {x_k} + {\alpha _k}{d_k}</script> <br />
我觉得&ldquo;阻力&rdquo;这个词还是比较形象的&mdash;&mdash;原来只有一个 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> ，现在多了一个 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_501cb8ba16bc463c7329e28f3ec226a7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{\alpha _k}" /></span><script type='math/tex'>{\alpha _k}</script> ，这就像是个阻碍啊。<br />
问题是， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_501cb8ba16bc463c7329e28f3ec226a7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{\alpha _k}" /></span><script type='math/tex'>{\alpha _k}</script> 怎么求呢？<br />
可以在确定 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 之后，利用line search技术，求出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_501cb8ba16bc463c7329e28f3ec226a7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{\alpha _k}" /></span><script type='math/tex'>{\alpha _k}</script> ，使之满足 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_178488525cdfe1026fada662fa2c21f7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="f({x_k} + {\alpha _k}{d_k}) = \mathop {\min }\limits_{\alpha \ge 0} f({x_k} + \alpha {d_k})" /></span><script type='math/tex'>f({x_k} + {\alpha _k}{d_k}) = \mathop {\min }\limits_{\alpha \ge 0} f({x_k} + \alpha {d_k})</script> （至于line search的算法，有太多太多了，<a href="http://www.codelast.com/?p=7364" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">这里</span></a>有几个可以参考一下）。<br />
满足了这个条件，会发生什么？<br />
大家还记得《<a href="http://www.codelast.com/?p=7514" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">使用一维搜索(line search)的算法的收敛性</span></a>》定理吗？仔细看里面的&ldquo;适用于使用精确line search技术的算法&rdquo;的收敛性定理，你就会发现，当满足了上面所说的条件时，（阻尼）牛顿法的整体收敛性就得到了保证。<br />
当然，满足上面所说的条件的前提，就是所有的 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 都正定。因为如果 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 不正定的话，就求不出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> ；求不出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 的话，就求不出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_501cb8ba16bc463c7329e28f3ec226a7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{\alpha _k}" /></span><script type='math/tex'>{\alpha _k}</script> ；求不出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_501cb8ba16bc463c7329e28f3ec226a7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{\alpha _k}" /></span><script type='math/tex'>{\alpha _k}</script> 的话，就求不出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_63df53abbcadecae947de65a842e4f86.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{x_{k + 1}}" /></span><script type='math/tex'>{x_{k + 1}}</script> ，因此就求不出迭代公式，寻优过程就无法进行。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
那么问题就来了：阻尼牛顿法确实offer了整体收敛性，但是它并没有解决一个问题： <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 不正定怎么办？此时迭代如何进行下去？因此，另一种改进方案应运而生，各位接着往下看。</p>
<p><span style="background-color:#00ff00;">『5』</span>Goldstein-Price修正<br />
首先，Goldstein和Price是两个人名，他们的具体生平事迹我没研究过。他们在1967年提出，如果 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 不正定（此时难以解出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_70df409d5a96a374522d67b7ec6cb9d9.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k} = - {G_k}^{ - 1}{g_k}" /></span><script type='math/tex'>{d_k} = - {G_k}^{ - 1}{g_k}</script> ），就用&ldquo;<a href="http://www.codelast.com/?p=8006" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">最速下降方向</span></a>&rdquo;来作为搜索方向（看似已经&ldquo;过时&rdquo;的最速下降法还是能发挥余热的，这就体现出来了）：</p>
<div>
	<img decoding="async" alt="Newton's method Goldstein-Price" src="http://www.codelast.com/wp-content/uploads/ckfinder/images/newton_method_goldstein_price.png" style="width: 400px; height: 73px;" /><br />
	其中， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_e6885dfcb4c4a8ddb730c59135ffe731.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\delta \in (0,1)" /></span><script type='math/tex'>\delta \in (0,1)</script> <br />
	在这样的条件下，就使得 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 总能满足 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_f9d2500ba754546d742143e1e9f0230c.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\cos ({d_k}, - {g_k}) \ge \delta " /></span><script type='math/tex'>\cos ({d_k}, - {g_k}) \ge \delta </script> ，从而也就满足了《<a href="http://www.codelast.com/?p=7514" target="_blank" rel="noopener noreferrer"><span style="background-color: rgb(255, 160, 122);">使用一维搜索(line search)的算法的收敛性</span></a>》定理中的&ldquo;搜索方向条件&rdquo;，从而（Goldstein-Price修正）牛顿法具有整体收敛性。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
	<span style="background-color:#00ff00;">『6』</span>Goldfeld修正<br />
	与上面的Goldstein-Price修正的思路不同，Goldfeld在1966年也提出了一种方法，他的方法虽然还是在搜索方向 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 上动手，但是当 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 不正定时，他不是用最速下降方向 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_be9976a20363f7c49bb370084b76dca7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt=" - {g_k}" /></span><script type='math/tex'> - {g_k}</script> 来作为搜索方向，而是将 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 修正成下降方向&mdash;&mdash;用下面的式子：<br />
	 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_bd5579526a4aeccd9e6784566e80ea59.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k} = - B_k^{ - 1}{g_k}" /></span><script type='math/tex'>{d_k} = - B_k^{ - 1}{g_k}</script> <br />
	其中， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_d6b197395545713e292f056543da88ce.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{B_k} = {G_k} + {E_k}" /></span><script type='math/tex'>{B_k} = {G_k} + {E_k}</script> 是一个正定矩阵， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> 称为修正矩阵。在 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> <span style="color:#0000ff;">满足一定条件</span>的时候，（Goldfeld修正）牛顿法具有整体收敛性。<br />
	具体要满足什么条件呢？一个关于矩阵 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_a8065340a6debc5139adfdd3265f2b07.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{B_k}" /></span><script type='math/tex'>{B_k}</script> &ldquo;<a href="http://zh.wikipedia.org/zh/%E6%9D%A1%E4%BB%B6%E6%95%B0" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">条件数</span></a>&rdquo;的条件。说实在的我对这部分不了解，并且这也不是本文的重点，所以在这里我就不把书上的定理搬上来了。<br />
	Goldfeld修正没有解决的问题就是：难以给出选取 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> 的有效方法。这就像是我告诉你，你要去魔法森林，就需要用到魔棒，但是魔棒去哪找，我不告诉你。于是，有其他的学者提出了其他的改进方法，帮你找到这个&ldquo;魔棒&rdquo;，请接着往下看。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
	<span style="background-color:#00ff00;">『7』</span>Gill-Murray的Cholesky分解法<br />
	看到这个小标题你可能就有点晕&mdash;&mdash;请尽情地晕吧，这里光是人名就有三个。最重要的就是Cholesky，这里我要补充一个小插曲，给大家说点轻松的知识（从网上复制来的，链接不记得了）：</div>
<blockquote>
<div>
		Cholesky是一个法国数学家，生于19世纪末。Cholesky分解是他在学术界最重要的贡献。后来，Cholesky参加了法国军队，不久在一战初始阵亡。<br />
		Cholesky分解是一种分解矩阵的方法, 在线性代数中有重要的应用。Cholesky分解把矩阵分解为一个下三角矩阵以及它的共轭转置矩阵的乘积（那实数界来类比的话，此分解就好像求平方根）。与一般的矩阵分解求解方程的方法比较，Cholesky分解效率很高。</div>
</blockquote>
<div>
	Cholesky真是英年早逝，以他对学术界的贡献来看，确实值得我们缅怀。<br />
	Gill和Murray这两个人，用Cholesky分解法实现了对牛顿法的改进，我个人觉得，他们的改进可以算是对Goldfeld修正的一种改进（或补充）吧，因为他们提供了求 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> 的方法。</p>
<p>	这里的Cholesky分解（牛顿法），是这么一回事：对 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> （即Hesse矩阵）进行Cholesky分解，在分解的过程中，对它进行一定的修正，最后得到近似的 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cbc036587240f187400a7665837a1611.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} " /></span><script type='math/tex'>\overline {{G_k}} </script> ，把这个 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cbc036587240f187400a7665837a1611.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} " /></span><script type='math/tex'>\overline {{G_k}} </script> 当作 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> ，用于解出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
	至于这个修正过程的具体做法，我只能说我不甚清楚，<span style="color:#800000;">我不想在这里误导大家，只想把我自己理解的写下来</span>：<br />
	若 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 为正定矩阵，则它总能进行Cholesky分解，即&nbsp; <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_fb77d6c4e0828b5703126ee6c6b6069a.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k} = {L_k}{D_k}L_k^T" /></span><script type='math/tex'>{G_k} = {L_k}{D_k}L_k^T</script> ，其中 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_a16319f7563602c7dbf5a2c5ca46ecc0.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{L_k}" /></span><script type='math/tex'>{L_k}</script> 是一个单位下三角矩阵， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_3961878be696f3864a17e9b34591e36e.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{D_k}" /></span><script type='math/tex'>{D_k}</script> 是一个对角矩阵（diagonal matrix，除主对角线外的元素均为0的方阵）。<br />
	若 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 不是个正定矩阵，那么就让Chokesky分解过程满足 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_e55103a097cb5f563b30c1461ebe9cb1.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} = {L_k}{D_k}L_k^T = {G_k} + {E_k}" /></span><script type='math/tex'>\overline {{G_k}} = {L_k}{D_k}L_k^T = {G_k} + {E_k}</script> （ <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> 是一个对角矩阵），并且在分解过中调整 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_3961878be696f3864a17e9b34591e36e.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{D_k}" /></span><script type='math/tex'>{D_k}</script> 对角线上的元素（人们总结出了一些调整方法，例如使这些元素&gt;某个正常数），使得Hesse矩阵正定&mdash;&mdash;这里说的Hesse矩阵，是指前面说的 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cbc036587240f187400a7665837a1611.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} " /></span><script type='math/tex'>\overline {{G_k}} </script> 。分解完成后，就可以用 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cbc036587240f187400a7665837a1611.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} " /></span><script type='math/tex'>\overline {{G_k}} </script> 来解出 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_66eea6bfeea7fcb327d435f627a2390b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{d_k}" /></span><script type='math/tex'>{d_k}</script> 了。<br />
	如果 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 是个<span style="color:#0000ff;">充分正定</span>（书上的名词，谁能给解释一下？）的矩阵，那么经过这个修正的过程， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cbc036587240f187400a7665837a1611.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\overline {{G_k}} " /></span><script type='math/tex'>\overline {{G_k}} </script> 其实就是原来的 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> ， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6117194d806ebd85685939d8d20e4de5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{E_k}" /></span><script type='math/tex'>{E_k}</script> 其实也就不存在了&mdash;&mdash;这是个很好的特性。<br />
	我感觉上面的修正过程，用妹子来做一个比喻就是：一个妹子本来已经长得挺漂亮了，你为她化个妆（只要不是故意黑她），她还是那么漂亮。反之，如果一个妹子长得很搓，那么，你为她化妆，是有可能让她看上去变靓的。总之，都得到了我们想要的结果。<br />
	Cholesky分解算法我没看过，这里就没办法说了。</p>
<p>	有书上说，Gill-Murray的Cholesky分解牛顿法是&ldquo;对牛顿法改造得最彻底、最有实用价值的方法&rdquo;。<br />
	看来，有时候真的是：最复杂的就是最好的，没有捷径可走啊。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
	<span style="background-color:#00ff00;">『8』</span>信赖域牛顿法<br />
	在<a href="http://www.codelast.com/?p=7488" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">这篇</span></a>解释信赖域算法的文章里，我们说过了，<span style="color:#0000ff;">信赖域算法具有整体收敛性</span>。利用这一点，可以将其与牛顿法&ldquo;合体&rdquo;，创造出具有整体收敛性的信赖域牛顿法，即，我们要求的问题是：</p>
<div>
		<img decoding="async" alt="Newton's method trust region" src="http://www.codelast.com/wp-content/uploads/ckfinder/images/newton_method_trust_region.png" style="width: 400px; height: 113px;" /><br />
		其中， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_03c7c0ace395d80182db07ae2c30f034.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="s" /></span><script type='math/tex'>s</script> 为位移， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> 表示第k次迭代， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_1cd5597a080292208723039cfd7bfd41.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{g_k}" /></span><script type='math/tex'>{g_k}</script> 为梯度， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 为Hesse矩阵（二阶导数矩阵）， <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_7897c8db80031ff84df5a87ff3761308.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{h_k}" /></span><script type='math/tex'>{h_k}</script> 为第k次迭代时的信赖域上界（半径）。<br />
		为什么它叫信赖域牛顿法？首先，它没有line search，求的是位移s，所以是一种信赖域算法；其次，它在求解的时候用到了梯度和二阶导数，因此是一种牛顿法。所以整体上叫它信赖域牛顿法是讲得过去的。<br />
		信赖域牛顿法有一个特点是令人欣慰的：没有要求 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> （即Hesse矩阵）必须正定，这与前面各种算法与 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_8ac8fda2ed3a75d5af87815e87a3ebc7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="{G_k}" /></span><script type='math/tex'>{G_k}</script> 正定那些纠缠不清的关系有很大不同。<br />
		至于信赖域算法的具体求解步骤是怎样的，这里就不说了，还是请大家参考<a href="http://www.codelast.com/?p=7488" target="_blank" rel="noopener noreferrer"><span style="background-color:#ffa07a;">这篇</span></a>文章。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" target="_blank" rel="noopener noreferrer"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
		<span style="background-color:#00ff00;">『9』</span>总结<br />
		对牛顿法及其众多改进的介绍就到这里结束了。大家会看到，里面有很多定理没给出证明，有些推导可能也不够严谨，但是它们的结论基本上是正确的，如果纠结于细节，那真的是要去做理论研究，而不是应用到工程实践了。所以，学习最优化的时候，我们可以在一定程度上&ldquo;着眼全局，忽略细节&rdquo;，这会极大地有助于理解。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
		感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
			<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /></p>
</p></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%86%8d%e8%b0%88-%e7%89%9b%e9%a1%bf%e6%b3%95newtons-method-in-optimization/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
	</channel>
</rss>
