<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>parallelism &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/tag/parallelism/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Mon, 27 Apr 2020 17:22:29 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>[原创] 强化学习框架 rlpyt 源码分析：(10) 基于CPU的并行采样器CpuSampler，worker的实现</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a10-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a10-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Tue, 21 Jan 2020 05:15:53 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[rlpyt]]></category>
		<category><![CDATA[并行]]></category>
		<category><![CDATA[强化学习]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=11674</guid>

					<description><![CDATA[<p>
查看关于 rlpyt&#160;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&#160;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&#160;本文是<a href="https://www.codelast.com/?p=11613" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">上一篇</span></a>文章的续文，继续分析CpuSampler的源码。<br />
本文将分析 CPU并行模式下的 ParallelSamplerBase 类的worker实现。</p>
<p><span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;worker的代码在哪<br />
<span style="color:#0000ff;">rlpyt/samplers/parallel/worker.py</span><br />
<span id="more-11674"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;worker是做什么用的<br />
用于采样agent与environment交互得到的数据。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;代码分析<br />
我直接在代码里加了大量注释：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">initialize_worker</span>(rank<span style="color:#cc7832;">, </span>seed=<span style="color:#cc7832;">None, </span>cpu=<span style="color:#cc7832;">None, </span>torch_threads=<span style="color:#cc7832;">None</span>):
    <span style="color:#629755;font-style:italic;">&#34;&#34;&#34;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">初始化采样用的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">。
</span>
<span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> rank: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">采样进程的标识序号。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> seed: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">种子，一个整数值。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> cpu: CPU</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">序号，例如</span><span style="color:#629755;font-style:italic;"> 0, 1, 2 </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">等等。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> torch_threads: CPU</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">并发执行的线程数。</span>
<span style="color:#629755;font-style:italic;">    &#34;&#34;&#34;
</span><span style="color:#629755;font-style:italic;">    </span>log_str = <span style="color:#6a8759;">f&#34;Sampler rank </span><span style="color:#cc7832;">{</span>rank<span style="color:#cc7832;">}</span><span style="color:#6a8759;"> initialized&#34;
</span><span style="color:#6a8759;">    </span>cpu = [cpu] <span style="color:#cc7832;">if </span><span style="color:#8888c6;">isinstance</span>(cpu<span style="color:#cc7832;">, </span><span style="color:#8888c6;">int</span>) <span style="color:#cc7832;">else </span>cpu
    p = psutil.Process()</pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a10-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>
查看关于 rlpyt&nbsp;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&nbsp;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&nbsp;本文是<a href="https://www.codelast.com/?p=11613" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">上一篇</span></a>文章的续文，继续分析CpuSampler的源码。<br />
本文将分析 CPU并行模式下的 ParallelSamplerBase 类的worker实现。</p>
<p><span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;worker的代码在哪<br />
<span style="color:#0000ff;">rlpyt/samplers/parallel/worker.py</span><br />
<span id="more-11674"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;worker是做什么用的<br />
用于采样agent与environment交互得到的数据。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;代码分析<br />
我直接在代码里加了大量注释：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">initialize_worker</span>(rank<span style="color:#cc7832;">, </span>seed=<span style="color:#cc7832;">None, </span>cpu=<span style="color:#cc7832;">None, </span>torch_threads=<span style="color:#cc7832;">None</span>):
    <span style="color:#629755;font-style:italic;">&quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">初始化采样用的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">。
</span>
<span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> rank: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">采样进程的标识序号。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> seed: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">种子，一个整数值。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> cpu: CPU</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">序号，例如</span><span style="color:#629755;font-style:italic;"> 0, 1, 2 </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">等等。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> torch_threads: CPU</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">并发执行的线程数。</span>
<span style="color:#629755;font-style:italic;">    &quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span>log_str = <span style="color:#6a8759;">f&quot;Sampler rank </span><span style="color:#cc7832;">{</span>rank<span style="color:#cc7832;">}</span><span style="color:#6a8759;"> initialized&quot;
</span><span style="color:#6a8759;">    </span>cpu = [cpu] <span style="color:#cc7832;">if </span><span style="color:#8888c6;">isinstance</span>(cpu<span style="color:#cc7832;">, </span><span style="color:#8888c6;">int</span>) <span style="color:#cc7832;">else </span>cpu
    p = psutil.Process()
    <span style="color:#cc7832;">try</span>:
        <span style="color:#cc7832;">if </span>cpu <span style="color:#cc7832;">is not None</span>:
            p.cpu_affinity(cpu)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">设置</span><span style="color:#808080;">CPU</span><span style="color:#808080;font-family:'AR PL UKai CN';">亲和性</span><span style="color:#808080;">(MacOS</span><span style="color:#808080;font-family:'AR PL UKai CN';">不支持</span><span style="color:#808080;">)
</span><span style="color:#808080;">        </span>cpu_affin = p.cpu_affinity()
    <span style="color:#cc7832;">except </span><span style="color:#8888c6;">AttributeError</span>:
        cpu_affin = <span style="color:#6a8759;">&quot;UNAVAILABLE MacOS&quot;
</span><span style="color:#6a8759;">    </span>log_str += <span style="color:#6a8759;">f&quot;, CPU affinity </span><span style="color:#cc7832;">{</span>cpu_affin<span style="color:#cc7832;">}</span><span style="color:#6a8759;">&quot;
</span><span style="color:#6a8759;">    </span>torch_threads = (<span style="color:#6897bb;">1 </span><span style="color:#cc7832;">if </span>torch_threads <span style="color:#cc7832;">is None and </span>cpu <span style="color:#cc7832;">is not None else
</span><span style="color:#cc7832;">        </span>torch_threads)  <span style="color:#808080;"># Default to 1 to avoid possible MKL hang.
</span><span style="color:#808080;">    </span><span style="color:#cc7832;">if </span>torch_threads <span style="color:#cc7832;">is not None</span>:
        torch.set_num_threads(torch_threads)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">设置</span><span style="color:#808080;">CPU</span><span style="color:#808080;font-family:'AR PL UKai CN';">并发执行的线程数
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>log_str += <span style="color:#6a8759;">f&quot;, Torch threads </span><span style="color:#cc7832;">{</span>torch.get_num_threads()<span style="color:#cc7832;">}</span><span style="color:#6a8759;">&quot;
</span><span style="color:#6a8759;">    </span><span style="color:#cc7832;">if </span>seed <span style="color:#cc7832;">is not None</span>:
        set_seed(seed)
        time.sleep(<span style="color:#6897bb;">0.3</span>)  <span style="color:#808080;"># (so the printing from set_seed is not intermixed)
</span><span style="color:#808080;">        </span>log_str += <span style="color:#6a8759;">f&quot;, Seed </span><span style="color:#cc7832;">{</span>seed<span style="color:#cc7832;">}</span><span style="color:#6a8759;">&quot;
</span><span style="color:#6a8759;">    </span>logger.log(log_str)


<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">sampling_process</span>(common_kwargs<span style="color:#cc7832;">, </span>worker_kwargs):
    <span style="color:#629755;font-style:italic;">&quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    Arguments fed from the Sampler class in master process.
</span>
<span style="color:#629755;font-style:italic;">    </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">采样进程函数。
</span>
<span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> common_kwargs: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">各个</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">通用的参数列表。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> worker_kwargs: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">各个</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">可能不同的参数列表。</span>
<span style="color:#629755;font-style:italic;">    &quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span>c<span style="color:#cc7832;">, </span>w = AttrDict(**common_kwargs)<span style="color:#cc7832;">, </span>AttrDict(**worker_kwargs)
    initialize_worker(w.rank<span style="color:#cc7832;">, </span>w.seed<span style="color:#cc7832;">, </span>w.cpus<span style="color:#cc7832;">, </span>c.torch_threads)
    <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">初始化用于</span><span style="color:#808080;">training</span><span style="color:#808080;font-family:'AR PL UKai CN';">的</span><span style="color:#808080;">environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例和</span><span style="color:#808080;">collector</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>envs = [c.EnvCls(**c.env_kwargs) <span style="color:#cc7832;">for </span>_ <span style="color:#cc7832;">in </span><span style="color:#8888c6;">range</span>(w.n_envs)]
    collector = c.CollectorCls(
        <span style="color:#aa4926;">rank</span>=w.rank<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">envs</span>=envs<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">samples_np</span>=w.samples_np<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">batch_T</span>=c.batch_T<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">TrajInfoCls</span>=c.TrajInfoCls<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">agent</span>=c.get(<span style="color:#6a8759;">&quot;agent&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,  </span><span style="color:#808080;"># Optional depending on parallel setup.
</span><span style="color:#808080;">        </span><span style="color:#aa4926;">sync</span>=w.get(<span style="color:#6a8759;">&quot;sync&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">step_buffer_np</span>=w.get(<span style="color:#6a8759;">&quot;step_buffer_np&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">global_B</span>=c.get(<span style="color:#6a8759;">&quot;global_B&quot;</span><span style="color:#cc7832;">, </span><span style="color:#6897bb;">1</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">env_ranks</span>=w.get(<span style="color:#6a8759;">&quot;env_ranks&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">    </span>)
    agent_inputs<span style="color:#cc7832;">, </span>traj_infos = collector.start_envs(c.max_decorrelation_steps)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">这里会做收集</span><span style="color:#808080;">(</span><span style="color:#808080;font-family:'AR PL UKai CN';">采样</span><span style="color:#808080;">)</span><span style="color:#808080;font-family:'AR PL UKai CN';">第一批数据的工作
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>collector.start_agent()  <span style="color:#808080;"># collector</span><span style="color:#808080;font-family:'AR PL UKai CN';">的初始化
</span>
<span style="color:#808080;font-family:'AR PL UKai CN';">    </span><span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">初始化用于</span><span style="color:#808080;">evaluation</span><span style="color:#808080;font-family:'AR PL UKai CN';">的</span><span style="color:#808080;">environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例和</span><span style="color:#808080;">collector</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span><span style="color:#cc7832;">if </span>c.get(<span style="color:#6a8759;">&quot;eval_n_envs&quot;</span><span style="color:#cc7832;">, </span><span style="color:#6897bb;">0</span>) &gt; <span style="color:#6897bb;">0</span>:
        eval_envs = [c.EnvCls(**c.eval_env_kwargs) <span style="color:#cc7832;">for </span>_ <span style="color:#cc7832;">in </span><span style="color:#8888c6;">range</span>(c.eval_n_envs)]
        eval_collector = c.eval_CollectorCls(
            <span style="color:#aa4926;">rank</span>=w.rank<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">envs</span>=eval_envs<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">TrajInfoCls</span>=c.TrajInfoCls<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">traj_infos_queue</span>=c.eval_traj_infos_queue<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">max_T</span>=c.eval_max_T<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">agent</span>=c.get(<span style="color:#6a8759;">&quot;agent&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">sync</span>=w.get(<span style="color:#6a8759;">&quot;sync&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">step_buffer_np</span>=w.get(<span style="color:#6a8759;">&quot;eval_step_buffer_np&quot;</span><span style="color:#cc7832;">, None</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span>)
    <span style="color:#cc7832;">else</span>:
        eval_envs = <span style="color:#8888c6;">list</span>()

    ctrl = c.ctrl  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">用于控制多个</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">进程同时运行时能正确运作的控制器
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>ctrl.barrier_out.wait()  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">每个</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">都有一个</span><span style="color:#808080;">wait()</span><span style="color:#808080;font-family:'AR PL UKai CN';">，加上</span><span style="color:#808080;">ParallelSamplerBase.initialize()</span><span style="color:#808080;font-family:'AR PL UKai CN';">中的一个</span><span style="color:#808080;">wait()</span><span style="color:#808080;font-family:'AR PL UKai CN';">，刚好</span><span style="color:#808080;">n_worker+1</span><span style="color:#808080;font-family:'AR PL UKai CN';">个
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span><span style="color:#cc7832;">while True</span>:
        collector.reset_if_needed(agent_inputs)  <span style="color:#808080;"># Outside barrier?
</span><span style="color:#808080;">        </span>ctrl.barrier_in.wait()
        <span style="color:#cc7832;">if </span>ctrl.quit.value:  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">在主进程中</span><span style="color:#808080;">set</span><span style="color:#808080;font-family:'AR PL UKai CN';">了这个值为</span><span style="color:#808080;">True</span><span style="color:#808080;font-family:'AR PL UKai CN';">时，所有</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">进程会退出采样
</span><span style="color:#808080;font-family:'AR PL UKai CN';">            </span><span style="color:#cc7832;">break
</span><span style="color:#cc7832;">        if </span>ctrl.do_eval.value:  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">在主进程的</span><span style="color:#808080;">evaluate_agent()</span><span style="color:#808080;font-family:'AR PL UKai CN';">函数里</span><span style="color:#808080;">set</span><span style="color:#808080;font-family:'AR PL UKai CN';">了这个值为</span><span style="color:#808080;">True</span><span style="color:#808080;font-family:'AR PL UKai CN';">时，这里才会收集</span><span style="color:#808080;">evaluation</span><span style="color:#808080;font-family:'AR PL UKai CN';">用的数据
</span><span style="color:#808080;font-family:'AR PL UKai CN';">            </span>eval_collector.collect_evaluation(ctrl.itr.value)  <span style="color:#808080;"># Traj_infos to queue inside.
</span><span style="color:#808080;">        </span><span style="color:#cc7832;">else</span>:  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">不是做</span><span style="color:#808080;">evaluation
</span><span style="color:#808080;">            </span>agent_inputs<span style="color:#cc7832;">, </span>traj_infos<span style="color:#cc7832;">, </span>completed_infos = collector.collect_batch(
                agent_inputs<span style="color:#cc7832;">, </span>traj_infos<span style="color:#cc7832;">, </span>ctrl.itr.value)
            <span style="color:#cc7832;">for </span>info <span style="color:#cc7832;">in </span>completed_infos:
                c.traj_infos_queue.put(info)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">向所有</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">进程共享的队列塞入当前</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">的统计数据
</span><span style="color:#808080;font-family:'AR PL UKai CN';">        </span>ctrl.barrier_out.wait()

    <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">清理</span><span style="color:#808080;">environment
</span><span style="color:#808080;">    </span><span style="color:#cc7832;">for </span>env <span style="color:#cc7832;">in </span>envs + eval_envs:
        env.close()
</pre>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
在worker的代码中，比较绕的就是，worker是怎么把采样到的数据返回放到replay buffer里的？<br />
在<a href="https://www.codelast.com/?p=11613" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">上一篇</span></a>文章中，我们知道 ParallelSamplerBase.initialize() 函数初始化了replay buffer：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
examples = <span style="color:#94558d;">self</span>._build_buffers(env<span style="color:#cc7832;">, </span>bootstrap_value)</pre>
<p>以及：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">_build_buffers</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>env<span style="color:#cc7832;">, </span>bootstrap_value):
    <span style="color:#94558d;">self</span>.samples_pyt<span style="color:#cc7832;">, </span><span style="color:#94558d;">self</span>.samples_np<span style="color:#cc7832;">, </span>examples = build_samples_buffer(
        <span style="color:#94558d;">self</span>.agent<span style="color:#cc7832;">, </span>env<span style="color:#cc7832;">, </span><span style="color:#94558d;">self</span>.batch_spec<span style="color:#cc7832;">, </span>bootstrap_value<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">agent_shared</span>=<span style="color:#cc7832;">True, </span><span style="color:#aa4926;">env_shared</span>=<span style="color:#cc7832;">True, </span><span style="color:#aa4926;">subprocess</span>=<span style="color:#cc7832;">True</span>)
    <span style="color:#cc7832;">return </span>examples</pre>
<p>在这里，self.samples_np 对应的是replay buffer的存储对象。而 worker 的参数&nbsp;workers_kwargs 初始化的时候，会把 self.samples_np 拆分成多个slice，并传入 worker：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#aa4926;">samples_np</span>=<span style="color:#94558d;">self</span>.samples_np[:<span style="color:#cc7832;">, </span>slice_B]<span style="color:#cc7832;">,</span></pre>
<p>在 worker 中，构造 collector 对象的时候，会把这个传入的 samples_np 再传给 collector 的构造函数。这样，replay buffer 就与 collector 关联起来了。<br />
最后，在 collector.collect_batch() 的时候，会把采样得到的数据放入 samples_np 中，也就是相当于放到了 replay buffer 里。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
这一节就到这，且听下回分解。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a10-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 强化学习框架 rlpyt 源码分析：(9) 基于CPU的并行采样器CpuSampler</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a9-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a9-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 20 Jan 2020 09:16:20 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[rlpyt]]></category>
		<category><![CDATA[并行]]></category>
		<category><![CDATA[强化学习]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=11613</guid>

					<description><![CDATA[<p>
查看关于 rlpyt&#160;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&#160;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&#160;本文是<a href="https://www.codelast.com/?p=11441" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">上一篇</span></a>文章的续文，继续分析CpuSampler的源码。<br />
我们已经知道了CpuSampler有两个父类：BaseSampler&#160;和&#160;ParallelSamplerBase。其中，BaseSampler主要是定义了一堆接口，没什么好说的，因此本文接着分析另一个父类&#160;ParallelSamplerBase。在&#160;ParallelSamplerBase 中，初始化函数&#160;initialize() 做了很多重要的工作，已经够写一篇长长的文章来分析了，这正是本文的主要内容。<br />
<span id="more-11613"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;初始化函数 initialize()&#160;做了哪些重要工作<br />
一句话总结 initialize() 的重要功能：计算一些特殊参数的值，初始化agent，创建<span style="color:#0000ff;">并行控制器</span>，创建并启动多个worker进程。<br />
<span style="color:#ff0000;">✍</span> 这里说的&#8220;<span style="color: rgb(0, 0, 255);">并行控制器</span>&#8221;(parallel ctrl)是指用Python&#160;multiprocessing模块来实现并行功能的时候，需要使用一些变量来协调各个并行的进程，使它们可以正确运作。这些用于协调的变量就是&#8220;并行控制器&#8221;。</p>
<p><span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;计算特殊参数的值<br />
在并行模式下，有些参数（比如采样用的worker的数量）不是由用户直接设置的，而是计算出来的。而且这样的参数还挺多，所以有大段大段的代码都用来干这事了。<br />
如果下面的代码没有注释的话，肯定会让人一头雾水：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
n_envs_list = <span style="color:#94558d;">self</span>._get_n_envs_list(</pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a9-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>
查看关于 rlpyt&nbsp;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&nbsp;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&nbsp;本文是<a href="https://www.codelast.com/?p=11441" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">上一篇</span></a>文章的续文，继续分析CpuSampler的源码。<br />
我们已经知道了CpuSampler有两个父类：BaseSampler&nbsp;和&nbsp;ParallelSamplerBase。其中，BaseSampler主要是定义了一堆接口，没什么好说的，因此本文接着分析另一个父类&nbsp;ParallelSamplerBase。在&nbsp;ParallelSamplerBase 中，初始化函数&nbsp;initialize() 做了很多重要的工作，已经够写一篇长长的文章来分析了，这正是本文的主要内容。<br />
<span id="more-11613"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;初始化函数 initialize()&nbsp;做了哪些重要工作<br />
一句话总结 initialize() 的重要功能：计算一些特殊参数的值，初始化agent，创建<span style="color:#0000ff;">并行控制器</span>，创建并启动多个worker进程。<br />
<span style="color:#ff0000;"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/270d.png" alt="✍" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span> 这里说的&ldquo;<span style="color: rgb(0, 0, 255);">并行控制器</span>&rdquo;(parallel ctrl)是指用Python&nbsp;multiprocessing模块来实现并行功能的时候，需要使用一些变量来协调各个并行的进程，使它们可以正确运作。这些用于协调的变量就是&ldquo;并行控制器&rdquo;。</p>
<p><span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;计算特殊参数的值<br />
在并行模式下，有些参数（比如采样用的worker的数量）不是由用户直接设置的，而是计算出来的。而且这样的参数还挺多，所以有大段大段的代码都用来干这事了。<br />
如果下面的代码没有注释的话，肯定会让人一头雾水：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
n_envs_list = <span style="color:#94558d;">self</span>._get_n_envs_list(<span style="color:#aa4926;">affinity</span>=affinity)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">用户设置的</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">数不一定与</span><span style="color:#808080;">environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">数相匹配，这里会重新调整
</span><span style="color:#94558d;">self</span>.n_worker = n_worker = <span style="color:#8888c6;">len</span>(n_envs_list)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">经过调整之后的</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">数
</span>B = <span style="color:#94558d;">self</span>.batch_spec.B  <span style="color:#808080;"># environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例的数量
</span>global_B = B * world_size  <span style="color:#808080;"># &quot;</span><span style="color:#808080;font-family:'AR PL UKai CN';">平行宇宙</span><span style="color:#808080;">&quot;</span><span style="color:#808080;font-family:'AR PL UKai CN';">概念下的</span><span style="color:#808080;">environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例的数量
</span>env_ranks = <span style="color:#8888c6;">list</span>(<span style="color:#8888c6;">range</span>(rank * B<span style="color:#cc7832;">, </span>(rank + <span style="color:#6897bb;">1</span>) * B))  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">含义可参考：</span><span style="color:#808080;">https://www.codelast.com/?p=10932
</span><span style="color:#94558d;">self</span>.world_size = world_size
<span style="color:#94558d;">self</span>.rank = rank

<span style="color:#cc7832;">if </span><span style="color:#94558d;">self</span>.eval_n_envs &gt; <span style="color:#6897bb;">0</span>:  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">在</span><span style="color:#808080;">example_*.py</span><span style="color:#808080;font-family:'AR PL UKai CN';">中传入的参数
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span><span style="color:#94558d;">self</span>.eval_n_envs_per = <span style="color:#8888c6;">max</span>(<span style="color:#6897bb;">1</span><span style="color:#cc7832;">, </span><span style="color:#94558d;">self</span>.eval_n_envs // n_worker)  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">计算每个</span><span style="color:#808080;">worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">至少承载几个</span><span style="color:#808080;">evaluation</span><span style="color:#808080;font-family:'AR PL UKai CN';">的</span><span style="color:#808080;">environment(</span><span style="color:#808080;font-family:'AR PL UKai CN';">至少</span><span style="color:#808080;">1)
</span><span style="color:#808080;">    </span><span style="color:#94558d;">self</span>.eval_n_envs = eval_n_envs = <span style="color:#94558d;">self</span>.eval_n_envs_per * n_worker  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">保证至少有</span><span style="color:#808080;">&quot;worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">数量</span><span style="color:#808080;">&quot;</span><span style="color:#808080;font-family:'AR PL UKai CN';">个</span><span style="color:#808080;">eval environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>logger.log(<span style="color:#6a8759;">f&quot;Total parallel evaluation envs: </span><span style="color:#cc7832;">{</span>eval_n_envs<span style="color:#cc7832;">}</span><span style="color:#6a8759;">.&quot;</span>)
    <span style="color:#94558d;">self</span>.eval_max_T = <span style="color:#72737a;">eval_max_T </span>= <span style="color:#8888c6;">int</span>(<span style="color:#94558d;">self</span>.eval_max_steps // eval_n_envs)</pre>
<p>
最为&ldquo;神奇&rdquo;的就是 <span style="color:#0000ff;">self._get_n_envs_list()</span> 这个函数，它用来计算<span style="color:#b22222;">每个worker承载几个environment实例</span>。这个说法是不是特别奇怪？原因是：用户可以指定environment实例的数量，也可以指定worker的数量，但这两个数量可能是不相等的，于是，要么worker数不够，要么worker数有多；在第1种情况下，一个worker需要带&gt;1个environment实例，在第2种情况下，不需要那么多worker，所以要减少worker的数量，才能保证一个worker刚好带一个environment实例。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
我给 self._get_n_envs_list()&nbsp;函数加上了注释，相信足以让大家理解它的功能了：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">_get_n_envs_list</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>affinity=<span style="color:#cc7832;">None, </span>n_worker=<span style="color:#cc7832;">None, </span>B=<span style="color:#cc7832;">None</span>):
    <span style="color:#629755;font-style:italic;">&quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">根据</span><span style="color:#629755;font-style:italic;">environment</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">实例的数量</span><span style="color:#629755;font-style:italic;">(</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">所谓的</span><span style="color:#629755;font-style:italic;">&quot;B&quot;)</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">，以及用户设定的用于采样的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">的数量</span><span style="color:#629755;font-style:italic;">(n_worker)</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">，来计算得到一个</span><span style="color:#629755;font-style:italic;">list</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">，这个</span><span style="color:#629755;font-style:italic;">list</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">的元素的总数，
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    就是最终的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">的数量；而这个</span><span style="color:#629755;font-style:italic;">list</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">里的每个元素的值，分别是每个</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">承载的</span><span style="color:#629755;font-style:italic;">environment</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">实例的数量。
</span>
<span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> affinity: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">一个字典</span><span style="color:#629755;font-style:italic;">(dict)</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">，包含硬件亲和性定义。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> n_worker: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">用户设定的用于采样的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">的数量。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> B: environment</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">实例的数量。
</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:return </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">一个</span><span style="color:#629755;font-style:italic;">list</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">，其含义如上所述。</span>
<span style="color:#629755;font-style:italic;">    &quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span>B = <span style="color:#94558d;">self</span>.batch_spec.B <span style="color:#cc7832;">if </span>B <span style="color:#cc7832;">is None else </span>B  <span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">参考</span><span style="color:#808080;">BatchSpec</span><span style="color:#808080;font-family:'AR PL UKai CN';">类，可以认为</span><span style="color:#808080;">B</span><span style="color:#808080;font-family:'AR PL UKai CN';">是</span><span style="color:#808080;">environment</span><span style="color:#808080;font-family:'AR PL UKai CN';">实例的数量
</span><span style="color:#808080;font-family:'AR PL UKai CN';">    </span>n_worker = <span style="color:#8888c6;">len</span>(affinity[<span style="color:#6a8759;">&quot;workers_cpus&quot;</span>]) <span style="color:#cc7832;">if </span>n_worker <span style="color:#cc7832;">is None else </span>n_worker  <span style="color:#808080;"># worker</span><span style="color:#808080;font-family:'AR PL UKai CN';">的数量</span><span style="color:#808080;">(</span><span style="color:#808080;font-family:'AR PL UKai CN';">不超过物理</span><span style="color:#808080;">CPU</span><span style="color:#808080;font-family:'AR PL UKai CN';">数否则在别处报错</span><span style="color:#808080;">)
</span><span style="color:#808080;">    </span><span style="color:#6a8759;">&quot;&quot;&quot;
</span><span style="color:#6a8759;">    </span><span style="color:#6a8759;font-family:'AR PL UKai CN';">当</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例的数量</span><span style="color:#6a8759;">&lt;worker</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">的数量时，例如有</span><span style="color:#6a8759;">8</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">个</span><span style="color:#6a8759;">worker(</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">即</span><span style="color:#6a8759;">8</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">个物理</span><span style="color:#6a8759;">CPU)</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">，</span><span style="color:#6a8759;">5</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">个</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例，每一个物理</span><span style="color:#6a8759;">CPU</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">运行一个</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">，
</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">    那么此时会有</span><span style="color:#6a8759;">3</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">个物理</span><span style="color:#6a8759;">CPU</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">多余，此时就会把</span><span style="color:#6a8759;">worker</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">的数量设置成和</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例数量一样，使得每个</span><span style="color:#6a8759;">CPU</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">都刚好运行一个</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例。</span>
<span style="color:#6a8759;">    &quot;&quot;&quot;
</span><span style="color:#6a8759;">    </span><span style="color:#cc7832;">if </span>B &lt; n_worker:
        logger.log(<span style="color:#6a8759;">f&quot;WARNING: requested fewer envs (</span><span style="color:#cc7832;">{</span>B<span style="color:#cc7832;">}</span><span style="color:#6a8759;">) than available worker &quot;
</span><span style="color:#6a8759;">            f&quot;processes (</span><span style="color:#cc7832;">{</span>n_worker<span style="color:#cc7832;">}</span><span style="color:#6a8759;">). Using fewer workers (but maybe better to &quot;
</span><span style="color:#6a8759;">            &quot;increase sampler&#39;s `batch_B`.&quot;</span>)
        n_worker = B
    n_envs_list = [B // n_worker] * n_worker
    <span style="color:#6a8759;">&quot;&quot;&quot;
</span><span style="color:#6a8759;">    </span><span style="color:#6a8759;font-family:'AR PL UKai CN';">当</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例的数量不是</span><span style="color:#6a8759;">worker</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">数量的整数倍时，每个</span><span style="color:#6a8759;">worker</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">被分配到的</span><span style="color:#6a8759;">environment</span><span style="color:#6a8759;font-family:'AR PL UKai CN';">实例的数量是不均等的。</span>
<span style="color:#6a8759;">    &quot;&quot;&quot;
</span><span style="color:#6a8759;">    </span><span style="color:#cc7832;">if not </span>B % n_worker == <span style="color:#6897bb;">0</span>:
        logger.log(<span style="color:#6a8759;">&quot;WARNING: unequal number of envs per process, from &quot;
</span><span style="color:#6a8759;">            f&quot;batch_B </span><span style="color:#cc7832;">{</span><span style="color:#94558d;">self</span>.batch_spec.B<span style="color:#cc7832;">}</span><span style="color:#6a8759;"> and n_worker </span><span style="color:#cc7832;">{</span>n_worker<span style="color:#cc7832;">} </span><span style="color:#6a8759;">&quot;
</span><span style="color:#6a8759;">            &quot;(possible suboptimal speed).&quot;</span>)
        <span style="color:#cc7832;">for </span>b <span style="color:#cc7832;">in </span><span style="color:#8888c6;">range</span>(B % n_worker):
            n_envs_list[b] += <span style="color:#6897bb;">1
</span><span style="color:#6897bb;">    </span><span style="color:#cc7832;">return </span>n_envs_list</pre>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;初始化agent<br />
<span style="color:#0000ff;">agent对象只有一个</span>！并不是每一个worker进程都对应到不同的agent对象！这是理解CpuSampler时需要知晓的一个重要概念。<br />
agent通过以下代码初始化（ParallelSamplerBase.initialize() 函数）：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
env = <span style="color:#94558d;">self</span>.<span style="color:#cc7833;">EnvCls</span>(**<span style="color:#94558d;">self</span>.env_kwargs)
<span style="color:#94558d;">self</span>.<span style="color:#cc7833;">_agent_init</span>(agent<span style="color:#cc7832;">, </span>env<span style="color:#cc7832;">, </span><span style="color:#aa4926;">global_B</span>=global_B<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">    </span><span style="color:#aa4926;">env_ranks</span>=env_ranks)
examples = <span style="color:#94558d;">self</span>.<span style="color:#cc7833;">_build_buffers</span>(env<span style="color:#cc7832;">, </span>bootstrap_value)
env.<span style="color:#cc7833;">close</span>()
<span style="color:#cc7832;font-weight:bold;">del </span>env</pre>
<p>可以看到，这里初始化了environment对象，并把它作为一个参数传给了agent初始化函数 self._agent_init()，事实上，在&nbsp;self._agent_init()&nbsp;函数里，只用到了 env&nbsp;对象的 <span style="color:#0000ff;">spaces</span>&nbsp;这个属性，而没有引用整个 env 对象，因此在使用完之后，使用 env.close()&nbsp;以及 del env&nbsp;来清理掉env不会有问题。<br />
self._build_buffers() 是一个非常复杂的操作，它的主要功能是创建强化学习中必备的<span style="color:#0000ff;">replay buffer</span>。直觉上，有人可能认为replay buffer这个东西，不就是创建一个list或者类似的数据结构就能搞定的吗？但实际上不是这么简单，从这个函数一级级点进去就会发现代码还不少，而且它里面甚至还用到了Python&nbsp;multiprocessing，所以创建replay buffer的实现就不在本文分析了。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
self._agent_init()&nbsp;函数的实现很简单：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">_agent_init</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>agent<span style="color:#cc7832;">, </span>env<span style="color:#cc7832;">, </span>global_B=<span style="color:#6897bb;">1</span><span style="color:#cc7832;">, </span>env_ranks=<span style="color:#cc7832;font-weight:bold;">None</span>):
    agent.<span style="color:#cc7833;">initialize</span>(env.spaces<span style="color:#cc7832;">, </span><span style="color:#aa4926;">share_memory</span>=<span style="color:#cc7832;font-weight:bold;">True</span><span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">global_B</span>=global_B<span style="color:#cc7832;">, </span><span style="color:#aa4926;">env_ranks</span>=env_ranks)
    <span style="color:#94558d;">self</span>.agent = agent</pre>
<p>在这里看到：agent初始化之后，赋值给了 self.agent，这就是&nbsp;CpuSampler&nbsp;中唯一使用的 agent&nbsp;对象。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;创建并行控制器<br />
并行控制器(parallel ctrl)用于协调多个采样用的worker进程。<br />
在&nbsp;initialize() 里，创建并行控制器的代码只有一句：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#cc7832;">def </span><span style="color:#ffc66d;">_build_parallel_ctrl</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>n_worker):
    <span style="color:#629755;font-style:italic;">&quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">创建用于控制并行训练过程的一些数据结构。</span>

<span style="color:#629755;font-style:italic;">    multiprocessing.RawValue</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">：不存在</span><span style="color:#629755;font-style:italic;">lock</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">的多进程间共享值。</span>
<span style="color:#629755;font-style:italic;">    multiprocessing.Barrier</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">：一种简单的同步原语，用于固定数目的进程相互等待。当所有进程都调用</span><span style="color:#629755;font-style:italic;">wait</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">以后，所有进程会同时开始执行。</span>
<span style="color:#629755;font-style:italic;">    multiprocessing.Queue</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">：用于多进程间数据传递的消息队列。
</span>
<span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">    </span><span style="color:#629755;font-weight:bold;font-style:italic;">:param</span><span style="color:#629755;font-style:italic;"> n_worker: </span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">真正的</span><span style="color:#629755;font-style:italic;">worker</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">数</span><span style="color:#629755;font-style:italic;">(</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">不一定等于用户设置的那个原始值</span><span style="color:#629755;font-style:italic;">)</span><span style="color:#629755;font-style:italic;font-family:'AR PL UKai CN';">。</span>
<span style="color:#629755;font-style:italic;">    &quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#94558d;">self</span>.ctrl = AttrDict(
        <span style="color:#aa4926;">quit</span>=mp.RawValue(ctypes.c_bool<span style="color:#cc7832;">, False</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">barrier_in</span>=mp.Barrier(n_worker + <span style="color:#6897bb;">1</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">barrier_out</span>=mp.Barrier(n_worker + <span style="color:#6897bb;">1</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">do_eval</span>=mp.RawValue(ctypes.c_bool<span style="color:#cc7832;">, False</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span><span style="color:#aa4926;">itr</span>=mp.RawValue(ctypes.c_long<span style="color:#cc7832;">, </span><span style="color:#6897bb;">0</span>)<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">    </span>)
    <span style="color:#94558d;">self</span>.traj_infos_queue = mp.Queue()
    <span style="color:#94558d;">self</span>.eval_traj_infos_queue = mp.Queue()
    <span style="color:#94558d;">self</span>.sync = AttrDict(<span style="color:#aa4926;">stop_eval</span>=mp.RawValue(ctypes.c_bool<span style="color:#cc7832;">, False</span>))</pre>
<p>这里AttrDict是一个&ldquo;扩展的&rdquo;dict，mp就是Python&nbsp;multiprocessing模块，而Python&nbsp;multiprocessing是一个巨大的话题，我自己也只是初步了解，所以没办法讲透彻，这里只举两个例子，来说明这些并行控制器的作用：<br />
<span style="color:#0000ff;"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span>&nbsp;ctrl.quit 可以理解为一个bool类型的进程间共享变量。在 minibatch_rl.py 中，训练完成后，会执行 shutdown()，它会调用 sampler.shutdown()，从而会把 ctrl.quit 的值设置为True；同时，在 worker.py 中会看到，当检测到 ctrl.quit 的值为True时，会退出采样过程。所有采样的worker进程都受这个变量控制。所以这样就做到了在主进程中控制并行跑的worker进程。<br />
<span style="color:#0000ff;"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span>&nbsp;multiprocessing.Queue() 用于在多进程间传递消息。在每个采样的worker进程中，会把收集到的trajectory info放到同一个traj_infos_queue中，在主进程中会把汇总的trajectory info进一步处理成统计数据，然后记日志、打印到屏幕上，等等。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;创建并启动多个worker进程<br />
worker进程用于采样(agent与environment交互得到的)数据。<br />
在创建这些进程之前，需要先为它们构建所需的参数：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
common_kwargs = <span style="color:#94558d;">self</span>._assemble_common_kwargs(affinity<span style="color:#cc7832;">, </span>global_B)
workers_kwargs = <span style="color:#94558d;">self</span>._assemble_workers_kwargs(affinity<span style="color:#cc7832;">, </span>seed<span style="color:#cc7832;">, </span>n_envs_list)</pre>
<p>为什么需要分成&nbsp;<span style="color:#0000ff;">common_kwargs</span> 以及&nbsp;<span style="color:#0000ff;">workers_kwargs</span> 两个参数？这是因为：对每个worker进程来说，有些参数是通用的，有些参数是不通用的（例如，每个worker使用的CPU数量、承载的environment实例的数量等），因此，rlpyt把它们分成了两拨，分别放在两个对象里。</p>
<p>在准备好了参数之后，就开始创建多个worker进程，并把它们启动起来了：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">创建一批子进程
</span>target = sampling_process <span style="color:#cc7832;">if </span>worker_process <span style="color:#cc7832;">is None else </span>worker_process
<span style="color:#94558d;">self</span>.workers = [mp.Process(<span style="color:#aa4926;">target</span>=target<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">    </span><span style="color:#aa4926;">kwargs</span>=<span style="color:#8888c6;">dict</span>(<span style="color:#aa4926;">common_kwargs</span>=common_kwargs<span style="color:#cc7832;">, </span><span style="color:#aa4926;">worker_kwargs</span>=w_kwargs))
    <span style="color:#cc7832;">for </span>w_kwargs <span style="color:#cc7832;">in </span>workers_kwargs]
<span style="color:#808080;"># </span><span style="color:#808080;font-family:'AR PL UKai CN';">启动子进程
</span><span style="color:#cc7832;">for </span>w <span style="color:#cc7832;">in </span><span style="color:#94558d;">self</span>.workers:
    w.start()

<span style="color:#94558d;">self</span>.ctrl.barrier_out.wait()  <span style="color:#808080;"># Wait for workers ready (e.g. decorrelate).</span></pre>
<p>在这里，使用的是 multiprocessing.Process() 来创建的进程，target 为进程函数名，进程函数是可以自行指定的，rlpyt也提供了默认的实现，即 worker.py 中的&nbsp;sampling_process() 函数。采样进程的实现代码 worker.py 虽然不长，但要完全看懂并不容易，所以留到后面的文章再分析。<br />
在worker进程启动之后，它就进入了持续的采样过程。注意上面代码的最后一句&nbsp;<span style="color:#0000ff;">self.ctrl.barrier_out.wait()</span>，这里使用了 multiprocessing的Barrier来控制各个worker进程同步。由于 barrier_out 创建的时候是这样的：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono';font-size:13.5pt;">
<span style="color:#aa4926;">barrier_out</span>=mp.Barrier(n_worker + <span style="color:#6897bb;">1</span>)</pre>
<p>所以，它需要 <span style="color:#0000ff;">n_worker + 1 </span>个 wait() 才能让所有进程同时&ldquo;解锁&rdquo;（即同时开始执行），在 initialize() 函数里的&nbsp;<span style="color: rgb(0, 0, 255);">self.ctrl.barrier_out.wait()&nbsp;</span>算一个，每个worker函数&mdash;&mdash;即 sampling_process()&mdash;&mdash;里也分别有一个 barrier_out.wait()，所有这些 wait() 加起来刚好是 <span style="color:#0000ff;">n_worker + 1</span> 个，这使得 initialize() 函数执行完，所有 worker 就会&ldquo;跑起来&rdquo;开始采样。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
这一节就到这，且听下回分解。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a9-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 强化学习框架 rlpyt 源码分析：(8) 基于CPU的并行采样器CpuSampler</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a8-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a8-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sun, 12 Jan 2020 09:40:26 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[rlpyt]]></category>
		<category><![CDATA[并行]]></category>
		<category><![CDATA[强化学习]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=11441</guid>

					<description><![CDATA[<p>
<em>写这篇文章的过程中，我改稿改到怀疑人生，因为有些我自己下的结论在看了很多次源码之后又自我否定了多次，所以这篇文章花了我很长时间才完工。虽然完稿之后我仍然不敢保证绝对正确，但这至少是在我当前认知情况下我&#8220;自以为&#8221;正确的版本了，写长稿不易，望理解。</em></p>
<p>查看关于 rlpyt&#160;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&#160;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&#160;</p>
<p>在单机上支持丰富的并行(Parallelism)模式是 rlpyt 有别于很多其他强化学习框架的一个显著特征。rlpyt可以使用纯CPU，或CPU、GPU混合的方式来并行执行训练过程。<br />
<span id="more-11441"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;rlpyt的sampler模块概览<br />
rlpyt有一种叫做&#8220;<a href="https://www.codelast.com/?p=10750" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">Sampler</span></a>&#8221;的模块，我们姑且称之为&#8220;采样器&#8221;，它用于采样/收集agent与environment交互的数据，对于不同的训练模式(串行、并行、异步)，rlpyt有不同的sampler实现：</p>
<blockquote>
<div>
		├── <span style="color:#0000ff;">async_</span></div>
<div>
		│&#160; &#160;├── action_server.py</div>
<div>
		│&#160; &#160;├── alternating_sampler.py</div>
<div>
		│&#160; &#160;├── base.py</div>
<div>
		│&#160; &#160;├── collectors.py</div>
<div>
		│&#160; &#160;├── cpu_sampler.py</div></blockquote>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a8-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>
<em>写这篇文章的过程中，我改稿改到怀疑人生，因为有些我自己下的结论在看了很多次源码之后又自我否定了多次，所以这篇文章花了我很长时间才完工。虽然完稿之后我仍然不敢保证绝对正确，但这至少是在我当前认知情况下我&ldquo;自以为&rdquo;正确的版本了，写长稿不易，望理解。</em></p>
<p>查看关于 rlpyt&nbsp;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&nbsp;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&nbsp;</p>
<p>在单机上支持丰富的并行(Parallelism)模式是 rlpyt 有别于很多其他强化学习框架的一个显著特征。rlpyt可以使用纯CPU，或CPU、GPU混合的方式来并行执行训练过程。<br />
<span id="more-11441"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;rlpyt的sampler模块概览<br />
rlpyt有一种叫做&ldquo;<a href="https://www.codelast.com/?p=10750" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">Sampler</span></a>&rdquo;的模块，我们姑且称之为&ldquo;采样器&rdquo;，它用于采样/收集agent与environment交互的数据，对于不同的训练模式(串行、并行、异步)，rlpyt有不同的sampler实现：</p>
<blockquote>
<div>
		├── <span style="color:#0000ff;">async_</span></div>
<div>
		│&nbsp; &nbsp;├── action_server.py</div>
<div>
		│&nbsp; &nbsp;├── alternating_sampler.py</div>
<div>
		│&nbsp; &nbsp;├── base.py</div>
<div>
		│&nbsp; &nbsp;├── collectors.py</div>
<div>
		│&nbsp; &nbsp;├── cpu_sampler.py</div>
<div>
		│&nbsp; &nbsp;├── gpu_sampler.py</div>
<div>
		│&nbsp; &nbsp;└── serial_sampler.py</div>
<div>
		├── base.py</div>
<div>
		├── buffer.py</div>
<div>
		├── collections.py</div>
<div>
		├── collectors.py</div>
<div>
		├── <span style="color:#0000ff;">parallel</span></div>
<div>
		│&nbsp; &nbsp;├── base.py</div>
<div>
		│&nbsp; &nbsp;├── cpu</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;├── collectors.py</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;└── sampler.py</div>
<div>
		│&nbsp; &nbsp;├── gpu</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;├── action_server.py</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;├── alternating_sampler.py</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;├── collectors.py</div>
<div>
		│&nbsp; &nbsp;│&nbsp; &nbsp;└── sampler.py</div>
<div>
		│&nbsp; &nbsp;└── worker.py</div>
<div>
		├── <span style="color:#0000ff;">serial</span></div>
<div>
		│&nbsp; &nbsp;├── collectors.py</div>
<div>
		│&nbsp; &nbsp;└── sampler.py</div>
</blockquote>
<p>
直观感受：串行(<span style="color:#0000ff;">serial</span>)模式的sampler代码最简单，并行(<span style="color:#0000ff;">parallel</span>)模式下的cpu并行实现比gpu并行实现简单一些，异步(<span style="color:#0000ff;">async_</span>)模式下的实现最复杂。<br />
不知道会不会有人好奇：为什么异步模式的module名是带下划线的<span style="color:#0000ff;">async_</span>而不是async呢？因为async在Python 3里是一个关键字，rlpyt的作者应该是为了避开这个问题才加了一个下划线。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
在前面的系列源码分析文章中，我已经分析过了串行(<span style="color: rgb(0, 0, 255);">serial</span>)模式下的sampler代码，本文想分析的是并行(<span style="color: rgb(0, 0, 255);">parallel</span>)模式下的CPU并行实现代码，也就是树形图里的这一部分：</p>
<div>
<blockquote>
<div>
			├── cpu</div>
<div>
			│&nbsp; &nbsp;├── collectors.py</div>
<div>
			│&nbsp; &nbsp;└── sampler.py</div>
</blockquote>
<div>
		CPU sampler在采样/收集数据的时候，完全不使用GPU，因此相对于GPU sampler来说会简单得多（只是相对而言）。它只有两个代码文件。当然，由于这两个文件里的class会继承其他父类，因此最终有关联的代码文件远不止这两个。下面我们就来详细分析一下。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;CPU sampler概览<br />
		CPU sampler的实现类是&nbsp;CpuSampler，一级级向上，有多个父类：</div>
</div>
<p><img decoding="async" alt="rlpyt" src="https://www.codelast.com/wp-content/uploads/2020/01/sampler_class_inheritance.png" style="width: 600px; height: 360px;" /><br />
这个BaseSampler，同时也是&nbsp;GpuSampler&nbsp;的最顶级父类。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<a href="https://www.codelast.com/?p=10932" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">前面的文章</span></a>已经讲过，sampler是collector外面包装的一层，真正去做数据收集工作的是collector类。对&nbsp;CpuSampler&nbsp;来说，它对应的collector代码实现在collectors.py中，里面包含多个collector类：CpuResetCollector，CpuWaitResetCollector，CpuEvalCollector等。<br />
所以应该从两条线来分析sampler class，一条线是&nbsp;<span style="color:#0000ff;">CpuSampler</span>&rarr;<span style="color:#0000ff;">ParallelSamplerBase</span>&rarr;<span style="color:#0000ff;">BaseSampler</span>，另一条线是collector class。为了不让篇幅过长，本文只分析第一条线，把collector class留到后面的文章。</p>
<p><span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;BaseSampler：一个主要用于定义各种接口的父类<br />
最顶层的父类BaseSampler主要定义了各种接口，很多函数都没有实现：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">initialize</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>*args<span style="color:#cc7832;">, </span>**kwargs):
    <span style="color:#cc7832;font-weight:bold;">raise </span><span style="color:#8888c6;">NotImplementedError
</span>
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">obtain_samples</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>itr):
    <span style="color:#cc7832;font-weight:bold;">raise </span><span style="color:#8888c6;">NotImplementedError  </span><span style="color:#808080;"># type: Samples
</span>
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">evaluate_agent</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>itr):
    <span style="color:#cc7832;font-weight:bold;">raise </span><span style="color:#8888c6;">NotImplementedError
</span>
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">shutdown</span>(<span style="color:#94558d;">self</span>):
    <span style="color:#cc7832;font-weight:bold;">pass</span></pre>
<p>而__init__()函数还是像<span style="background-color:#ffa07a;"><a href="https://www.codelast.com/?p=10831" rel="noopener noreferrer" target="_blank">之前见识过的套路</a></span>一样，使用save__init__args()来把可变参数保存到对象属性里：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7833;">save__init__args</span>(<span style="color:#8888c6;">locals</span>())</pre>
<p>其余就没啥好说的了。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;CpuSampler：主要充当一个入口<br />
CpuSampler类的代码相当少，它主要充当一个入口，而不是实现主要逻辑：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">class </span><span style="font-weight:bold;">CpuSampler</span>(ParallelSamplerBase):

    <span style="color:#cc7832;font-weight:bold;">def </span><span style="color:#b200b2;">__init__</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>*args<span style="color:#cc7832;">, </span>CollectorCls=CpuResetCollector<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span>eval_CollectorCls=CpuEvalCollector<span style="color:#cc7832;">, </span>**kwargs):
        <span style="color:#808080;"># e.g. or use CpuWaitResetCollector, etc...
</span><span style="color:#808080;">        </span><span style="color:#8888c6;">super</span>().<span style="color:#b200b2;">__init__</span>(*args<span style="color:#cc7832;">, </span><span style="color:#aa4926;">CollectorCls</span>=CollectorCls<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span><span style="color:#aa4926;">eval_CollectorCls</span>=eval_CollectorCls<span style="color:#cc7832;">, </span>**kwargs)

    <span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">obtain_samples</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>itr):
        <span style="color:#94558d;">self</span>.agent.<span style="color:#cc7833;">sync_shared_memory</span>()  <span style="color:#808080;"># New weights in workers, if needed.
</span><span style="color:#808080;">        </span><span style="color:#cc7832;font-weight:bold;">return </span><span style="color:#8888c6;">super</span>().<span style="color:#cc7833;">obtain_samples</span>(itr)

    <span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">evaluate_agent</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>itr):
        <span style="color:#94558d;">self</span>.agent.<span style="color:#cc7833;">sync_shared_memory</span>()
        <span style="color:#cc7832;font-weight:bold;">return </span><span style="color:#8888c6;">super</span>().<span style="color:#cc7833;">evaluate_agent</span>(itr)</pre>
<p>其中，obtain_samples() 用于采样一批数据，evaluate_agent() 用于评估agent&mdash;&mdash;或者说是评估模型，差不多的意思。<br />
这两个函数都调用父类<span style="color:#0000ff;">ParallelSamplerBase</span>的同名函数来实现对应功能，后面会在其他文章里具体分析。<br />
在这两个函数的开头，都有一个&nbsp;self.agent.sync_shared_memory()&nbsp;的操作，这是干嘛？<br />
其功能是：<span style="color:#b22222;">在并行模式下，采样/评估之前先同步shared model</span>。<br />
<span style="color:#0000ff;">sync_shared_memory()</span>&nbsp;函数的实现是：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="python language-python hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">def</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">sync_shared_memory</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(self)</span>:</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">if</span>&nbsp;self.shared_model&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">is</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">not</span>&nbsp;self.model:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;self.shared_model.load_state_dict(strip_ddp_state_dict(
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;self.model.state_dict()))
</code></pre>
</section>
<p>这里的意思是：当 self.model 被训练过之后，可能已经和 self.shared_model 不是一个东西了，此时需要把 self.model 的参数copy到 self.shared_model&nbsp;里。<br />
<span style="color: rgb(0, 0, 255);">strip_ddp_state_dict()</span>函数是一个很tricky的操作，为什么从 self.model&nbsp;取出来的 state_dict&nbsp;不能直接用 load_state_dict()&nbsp;加载到 self.shared_model&nbsp;里呢？关于这一点，我觉得代码的注释里写得比较清楚，建议直接去看它。<br />
这里就产生了两个问题：<span style="color:#0000ff;">✓</span> <span style="color:#ff0000;">什么是shared model？</span>&nbsp;<span style="color:#0000ff;">✓</span> <span style="color:#ff0000;">为什么要同步shared model？</span><br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;什么是shared model<br />
从名字上猜测，shared model就是一个&ldquo;共享的模型&rdquo;，之所以会有&ldquo;共享&rdquo;这个概念，是因为在多个进程中都需要使用模型，所以才需要&ldquo;共享&rdquo;。<br />
<span style="color:#ff8c00;"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span> rlpyt在并行(<span style="color: rgb(0, 0, 255);">parallel</span>)模式下，会产生多个&ldquo;worker&rdquo;跑在多个进程里，这些worker会各自在environment中采样，采样得到的数据用于优化模型。<br />
<span style="color: rgb(255, 140, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span>&nbsp;worker在采样的时候会选择action，此时会用模型来做action selection。<br />
<span style="color: rgb(255, 140, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span>&nbsp;所有worker关联到同一个agent对象(agent包含了策略网络的参数)，只有一个进程会去做优化模型(也就是反向传播之类)的工作，这一点要特别注意，是一个进程，而不是所有worker进程！<br />
<span style="color: rgb(255, 140, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2714.png" alt="✔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span>&nbsp;在每个agent对象内部，会有一个类型为 torch.nn.Module 的 self.model 对象，还有一个 self.shared_model 对象，我们可以从agent的父类&nbsp;BaseAgent&nbsp;的__init__()函数中看到这一点：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">def </span><span style="color:#b200b2;">__init__</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>ModelCls=<span style="color:#cc7832;font-weight:bold;">None</span><span style="color:#cc7832;">, </span>model_kwargs=<span style="color:#cc7832;font-weight:bold;">None</span><span style="color:#cc7832;">, </span>initial_model_state_dict=<span style="color:#cc7832;font-weight:bold;">None</span>):
    <span style="color:#cc7833;">save__init__args</span>(<span style="color:#8888c6;">locals</span>())
    <span style="color:#94558d;">self</span>.model = <span style="color:#cc7832;font-weight:bold;">None  </span><span style="color:#808080;"># type: torch.nn.Module
</span><span style="color:#808080;">    </span><span style="color:#94558d;">self</span>.shared_model = <span style="color:#cc7832;font-weight:bold;">None</span></pre>
<p>在agent对象初始化的时候，即在 BaseAgent.initialize() 函数中，会把 self.shared_model&nbsp;初始化成和 self.model&nbsp;一样：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">initialize</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>env_spaces<span style="color:#cc7832;">, </span>share_memory=<span style="color:#cc7832;font-weight:bold;">False</span><span style="color:#cc7832;">, </span>**kwargs):
    <span style="color:#629755;font-style:italic;">&quot;&quot;&quot;In this default setup, self.model is treated as the model needed
</span><span style="color:#629755;font-style:italic;">    for action selection, so it is the only one shared with workers.&quot;&quot;&quot;
</span><span style="color:#629755;font-style:italic;">    </span><span style="color:#94558d;">self</span>.env_model_kwargs = <span style="color:#94558d;">self</span>.<span style="color:#cc7833;">make_env_to_model_kwargs</span>(env_spaces)
    <span style="color:#94558d;">self</span>.model = <span style="color:#94558d;">self</span>.<span style="color:#cc7833;">ModelCls</span>(**<span style="color:#94558d;">self</span>.env_model_kwargs<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">        </span>**<span style="color:#94558d;">self</span>.model_kwargs)
    <span style="color:#cc7832;font-weight:bold;">if </span>share_memory:
        <span style="color:#94558d;">self</span>.model.<span style="color:#cc7833;">share_memory</span>()
        <span style="color:#94558d;">self</span>.shared_model = <span style="color:#94558d;">self</span>.model</pre>
<p>上面代码中的 if share_memory&nbsp;这个条件是否得到满足呢？<br />
在并行模式下，也就是从 ParallelSamplerBase._agent_init()&nbsp;函数的代码我们可以发现，agent初始化的时候 share_memory&nbsp;参数被设置成了&nbsp;True：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
agent.<span style="color:#cc7833;">initialize</span>(env.spaces<span style="color:#cc7832;">, </span><span style="color:#aa4926;">share_memory</span>=<span style="color:#cc7832;font-weight:bold;">True</span><span style="color:#cc7832;">,
</span><span style="color:#cc7832;">    </span><span style="color:#aa4926;">global_B</span>=global_B<span style="color:#cc7832;">, </span><span style="color:#aa4926;">env_ranks</span>=env_ranks)</pre>
<p>所以 if share_memory&nbsp;的条件是满足的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
如果使用GPU训练模型，那么rlpyt会把model挪到用户指定的设备上，而shared_model需要放在CPU上(<a href="https://towardsdatascience.com/speed-up-your-algorithms-part-1-pytorch-56d8a4ae7051" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">经查</span></a>，PyTorch的Tensor或模型参数也是可以放在GPU上共享的，但有一些容易出错、需要谨慎处理的细节，所以我猜由于这个原因，作者选择了把shared_model放在CPU上)，因此，这里创建出来了一个self.shared_model，用来防止之后self.model有可能被挪到GPU的情况发生&mdash;&mdash;如果发生了，self.shared_model这个放在CPU上的模型才是多个进程间的共享模型。<br />
那么这个shared_model在CpuSampler中真的有用吗？下面我们就一层层地挖下去，看看这个东西到底有没有用。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;为什么要同步shared model<br />
先说结论：在CpuSampler里，完全不需要同步。<br />
为了确认这个结论，我们看看在使用CPU sampler的时候，BaseAgent类里的 self.shared_model&nbsp;到底用在了什么地方。通过搜索代码，发现除了 <span style="color:#0000ff;">sync_shared_memory()</span>&nbsp;函数之外，只有两个地方在用：<br />
1、上面提到的&nbsp;BaseAgent.initialize()&nbsp;函数。在这里，对 self.shared_model&nbsp;只有赋值操作，没有使用。<br />
2、to_device()&nbsp;函数：</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'Menlo';font-size:12.0pt;">
<span style="color:#cc7832;font-weight:bold;">def </span><span style="font-weight:bold;">to_device</span>(<span style="color:#94558d;">self</span><span style="color:#cc7832;">, </span>cuda_idx=<span style="color:#cc7832;font-weight:bold;">None</span>):
<span style="color:#629755;font-style:italic;">    </span><span style="color:#cc7832;font-weight:bold;">if </span>cuda_idx <span style="color:#cc7832;font-weight:bold;">is None</span>:
        <span style="color:#cc7832;font-weight:bold;">return
</span><span style="color:#cc7832;font-weight:bold;">    if </span><span style="color:#94558d;">self</span>.shared_model <span style="color:#cc7832;font-weight:bold;">is not None</span>:
        <span style="color:#94558d;">self</span>.model = <span style="color:#94558d;">self</span>.<span style="color:#cc7833;">ModelCls</span>(**<span style="color:#94558d;">self</span>.env_model_kwargs<span style="color:#cc7832;">,
</span><span style="color:#cc7832;">            </span>**<span style="color:#94558d;">self</span>.model_kwargs)
        <span style="color:#94558d;">self</span>.model.<span style="color:#cc7833;">load_state_dict</span>(<span style="color:#94558d;">self</span>.shared_model.<span style="color:#cc7833;">state_dict</span>())
    <span style="color:#94558d;">self</span>.device = torch.<span style="color:#cc7833;">device</span>(<span style="color:#008080;">&quot;cuda&quot;</span><span style="color:#cc7832;">, </span><span style="color:#aa4926;">index</span>=cuda_idx)
    <span style="color:#94558d;">self</span>.model.<span style="color:#cc7833;">to</span>(<span style="color:#94558d;">self</span>.device)</pre>
<p>在这一段代码中，当使用CPU sampler时，cuda_idx&nbsp;为 None，因此直接return了，self.shared_model&nbsp;根本触达不到。<br />
此外，BaseAgent的其他所有使用 self.shared_model&nbsp;的地方，都是和异步(<span style="color: rgb(0, 0, 255);">async_</span>)模式相关的，和并行(<span style="color:#0000ff;">parallel</span>)模式无关。<br />
因此，对CpuSampler来说，shared_model没用，不需要调用 sync_shared_memory()&nbsp;来同步shared_model。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;shared model在什么情况下有意义<br />
对CpuSampler来说，BaseAgent里的 self.model&nbsp;对各个采样的worker来说都会实时更新，在action&nbsp;selection的时候使用的也是 self.model，而不是 self.shared_model，所以 shared_model&nbsp;对CpuSampler来说其实没有意义。<br />
但在其他模式下 shared model 还是有意义的，而且机制更复杂。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
这一节就到这，且听下回分解。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e6%ba%90%e7%a0%81%e5%88%86%e6%9e%90%ef%bc%9a8-%e5%9f%ba%e4%ba%8ecpu%e7%9a%84%e5%b9%b6%e8%a1%8c%e9%87%87%e6%a0%b7/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 强化学习框架 rlpyt 并行(parallelism)原理初探</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e5%b9%b6%e8%a1%8cparallelism%e5%8e%9f%e7%90%86%e5%88%9d%e6%8e%a2/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e5%b9%b6%e8%a1%8cparallelism%e5%8e%9f%e7%90%86%e5%88%9d%e6%8e%a2/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 23 Dec 2019 05:26:47 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[rlpyt]]></category>
		<category><![CDATA[并行]]></category>
		<category><![CDATA[强化学习]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=11346</guid>

					<description><![CDATA[<p>
查看关于 rlpyt&#160;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&#160;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&#160;</p>
<p>在单机上全面的并行（Parallelism）特性是 rlpyt 有别于很多其他强化学习框架的一个显著特征。在前面的简介文章中，已经介绍了 rlpyt 支持多种场景下的并行训练。而这种&#8220;武功&#8221;是怎么修炼出来的呢？它是站在了巨人的肩膀上&#8212;&#8212;通过PyTorch的多进程(multiprocessing)机制来实现的。<br />
所以你知道为什么 rlpyt 不使用TensorFlow这样的框架来作为后端了吧，因为TensorFlow根本就没有这种功能。TensorFlow只能靠类似于<a href="https://github.com/ray-project/ray" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">Ray</span></a>这样的并行计算框架的帮助，才能支撑起全方位的并行特性。<br />
<span id="more-11346"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);">▶▶</span></span>&#160;为什么说TensorFlow自身的并行能力并不适用于强化学习场景<br />
限于我掌握的知识，我不保证下面的结论都是正确的，请专家们不吝赐教。<br />
相信很多刚开始学写强化学习程序的人，都是从<a href="https://morvanzhou.github.io/" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">莫凡</span></a>的强化学习教程开始的，莫凡的强化学习教程使用的是TensorFlow来实现的（很久以前看到是这样，后来我没有再去关注过，不知道他有没有发布在其他ML框架下的RL教程）。<br />
看过一部分莫凡RL代码的人都会知道，里面用TensorFlow实现的静态图多进程&#8220;并行&#8221;训练逻辑有多么晦涩（而且并行其实是伪并行，说到底还是串行）。<br />
我个人认为，如果一个初学者从这样的程序入手，其实就相当于&#8220;劝退&#8221;，也就是说：这程序这么难写，你还是别学了吧。如果有与莫凡的RL代码逻辑对等的PyTorch代码，那绝对会是另一番景象。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
有人会说，明明TensorFlow就支持并行训练的啊！现在很多模型不就是通过多机多卡分布式训练的吗？<br />
然而到了强化学习场景下，就不是这么一回事了：强化学习和监督学习很不一样。在强化学习场景下，如果要并行训练的话，会需要多个agent，与多个environment交互，对应到程序就是多个进程/线程。与environment交互的过程，可以是纯CPU计算，也可以是CPU/GPU混合计算（例如，inference得到action的过程就可以放在GPU上加速），但这个过程不能是纯GPU计算的过程。以Atari游戏模拟器为例，调用<a href="https://github.com/mgbellemare/Arcade-Learning-Environment" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">ALE</span></a>接口得到Atari环境的反馈，这个过程就是CPU计算的，不能在GPU上计算。整个强化学习的流程，数据就是这样不断地在CPU/GPU之间流转，当然你可以使用纯CPU，但假设你使用了GPU的话，也只能在一小部分工作中使用GPU，其实CPU的工作也很重。反观supervised learning，当你把数据预处理好了之后，就可以一次性地喂给GPU，GPU在单机单卡训练的时候，可以把结果全部算完了再吐回给CPU；就算是Distributed TensorFlow，也不适用于强化学习，因为Distributed TensorFlow的并行功能是为了并行地使用GPU对吧？但强化学习的采样过程是使用CPU，按我的理解这部分工作不能使用Distributed TensorFlow来并行，相反PyTorch有<span style="color:#0000ff;">multiprocessing</span>可以做到；而计算梯度之类的工作用Distributed TensorFlow就可以并行了&#8212;&#8212;但别的DL框架例如PyTorch也可以啊。<br />
所以Distributed TensorFlow在RL场景下有什么优势？没看出来。<br />
关于TensorFlow在强化学习场景下的应用，莫凡当时也<a href="https://www.zhihu.com/question/63342728/answer/297818331" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">在知乎向网友提问</span></a>如何能在TF下较好地实现强化学习的并行功能，结论大概就是：还是用PyTorch吧！<br />
另外，知乎上有<a href="https://www.zhihu.com/question/308716947/answer/571637089" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">一个讨论</span></a>，提问者对GPU并行训练DRL模型的并行过程提出了疑问。第一个回答里面说&#8220;采样过程可以并行&#8221;，但作者说的并不是指Distributed TensorFlow支持这个功能。<br />
所以我认为，TensorFlow由于缺少了类似于PyTorch&#160;multiprocessing那样的模块，它只能借助于类似于Ray的并行计算框架，也就是在外面再&#8220;包装一层&#8221;，才能把TF对&#8220;全面的并行强化学习&#8221;的缺陷给修补上。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e5%b9%b6%e8%a1%8cparallelism%e5%8e%9f%e7%90%86%e5%88%9d%e6%8e%a2/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>
查看关于 rlpyt&nbsp;的更多文章请点击<a href="https://www.codelast.com/?p=10907" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><a href="https://github.com/astooke/rlpyt" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">rlpyt</span></a>&nbsp;是<span style="color: rgb(0, 0, 255);">BAIR</span>(Berkeley Artificial Intelligence Research，伯克利人工智能研究所)开源的一个强化学习(<span style="color: rgb(255, 0, 0);">RL</span>)框架。我之前写了一篇它的<a href="https://www.codelast.com/?p=10643" rel="noopener noreferrer" target="_blank"><span style="background-color: rgb(255, 160, 122);">简介</span></a>。&nbsp;</p>
<p>在单机上全面的并行（Parallelism）特性是 rlpyt 有别于很多其他强化学习框架的一个显著特征。在前面的简介文章中，已经介绍了 rlpyt 支持多种场景下的并行训练。而这种&ldquo;武功&rdquo;是怎么修炼出来的呢？它是站在了巨人的肩膀上&mdash;&mdash;通过PyTorch的多进程(multiprocessing)机制来实现的。<br />
所以你知道为什么 rlpyt 不使用TensorFlow这样的框架来作为后端了吧，因为TensorFlow根本就没有这种功能。TensorFlow只能靠类似于<a href="https://github.com/ray-project/ray" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">Ray</span></a>这样的并行计算框架的帮助，才能支撑起全方位的并行特性。<br />
<span id="more-11346"></span><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;为什么说TensorFlow自身的并行能力并不适用于强化学习场景<br />
限于我掌握的知识，我不保证下面的结论都是正确的，请专家们不吝赐教。<br />
相信很多刚开始学写强化学习程序的人，都是从<a href="https://morvanzhou.github.io/" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">莫凡</span></a>的强化学习教程开始的，莫凡的强化学习教程使用的是TensorFlow来实现的（很久以前看到是这样，后来我没有再去关注过，不知道他有没有发布在其他ML框架下的RL教程）。<br />
看过一部分莫凡RL代码的人都会知道，里面用TensorFlow实现的静态图多进程&ldquo;并行&rdquo;训练逻辑有多么晦涩（而且并行其实是伪并行，说到底还是串行）。<br />
我个人认为，如果一个初学者从这样的程序入手，其实就相当于&ldquo;劝退&rdquo;，也就是说：这程序这么难写，你还是别学了吧。如果有与莫凡的RL代码逻辑对等的PyTorch代码，那绝对会是另一番景象。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
有人会说，明明TensorFlow就支持并行训练的啊！现在很多模型不就是通过多机多卡分布式训练的吗？<br />
然而到了强化学习场景下，就不是这么一回事了：强化学习和监督学习很不一样。在强化学习场景下，如果要并行训练的话，会需要多个agent，与多个environment交互，对应到程序就是多个进程/线程。与environment交互的过程，可以是纯CPU计算，也可以是CPU/GPU混合计算（例如，inference得到action的过程就可以放在GPU上加速），但这个过程不能是纯GPU计算的过程。以Atari游戏模拟器为例，调用<a href="https://github.com/mgbellemare/Arcade-Learning-Environment" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">ALE</span></a>接口得到Atari环境的反馈，这个过程就是CPU计算的，不能在GPU上计算。整个强化学习的流程，数据就是这样不断地在CPU/GPU之间流转，当然你可以使用纯CPU，但假设你使用了GPU的话，也只能在一小部分工作中使用GPU，其实CPU的工作也很重。反观supervised learning，当你把数据预处理好了之后，就可以一次性地喂给GPU，GPU在单机单卡训练的时候，可以把结果全部算完了再吐回给CPU；就算是Distributed TensorFlow，也不适用于强化学习，因为Distributed TensorFlow的并行功能是为了并行地使用GPU对吧？但强化学习的采样过程是使用CPU，按我的理解这部分工作不能使用Distributed TensorFlow来并行，相反PyTorch有<span style="color:#0000ff;">multiprocessing</span>可以做到；而计算梯度之类的工作用Distributed TensorFlow就可以并行了&mdash;&mdash;但别的DL框架例如PyTorch也可以啊。<br />
所以Distributed TensorFlow在RL场景下有什么优势？没看出来。<br />
关于TensorFlow在强化学习场景下的应用，莫凡当时也<a href="https://www.zhihu.com/question/63342728/answer/297818331" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">在知乎向网友提问</span></a>如何能在TF下较好地实现强化学习的并行功能，结论大概就是：还是用PyTorch吧！<br />
另外，知乎上有<a href="https://www.zhihu.com/question/308716947/answer/571637089" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">一个讨论</span></a>，提问者对GPU并行训练DRL模型的并行过程提出了疑问。第一个回答里面说&ldquo;采样过程可以并行&rdquo;，但作者说的并不是指Distributed TensorFlow支持这个功能。<br />
所以我认为，TensorFlow由于缺少了类似于PyTorch&nbsp;multiprocessing那样的模块，它只能借助于类似于Ray的并行计算框架，也就是在外面再&ldquo;包装一层&rdquo;，才能把TF对&ldquo;全面的并行强化学习&rdquo;的缺陷给修补上。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;PyTorch的多进程处理功能<br />
参考<span style="background-color:#ffa07a;"><a href="https://www.jiqizhixin.com/articles/2019-12-02-6" rel="noopener noreferrer" target="_blank">这段话</a></span>：</p>
<blockquote>
<div>
		由于全局解释器锁（global interpreter lock，GIL）的 Python 默认实现不允许并行线程进行并行执行，所以为了解决该问题，Python 社区已经建立了一个标准的多进程处理模块，其中包含了大量的实用程序（utility），它们可以使得用户轻易地生成子进程并能够实现基础的进程间通信原语（communication primitive）。</div>
<div>
		&nbsp;</div>
<div>
		然而，原语的实现使用了与磁盘上持久性（on-disk persistence）相同格式的序列化，这在处理大规模数组时效率不高。所以，PyTorch 将Python 的 multiprocessing 模块扩展为 torch.multiprocessing，这就替代了内置包，并且自动将发送至其他进程的张量数据移动至共享内存中，而不用再通过通信渠道发送。</div>
<div>
		&nbsp;</div>
<div>
		PyTorch 的这一设计极大地提升了性能，并且弱化了进程隔离（process isolation），从而产生了更类似于普通线程程序的编程模型。</div>
</blockquote>
<div>
	看看就好，想深入了解的话请移步PyTorch文档。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(0, 0, 255);"><span style="background-color: rgb(0, 255, 0);"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/25b6.png" alt="▶" class="wp-smiley" style="height: 1em; max-height: 1em;" /></span></span>&nbsp;rlpyt的并行(parallelism)功能的局限<br />
	rlpyt瞄准的是单机上的RL训练效率的极致优化，它不支持多机训练。在单机硬件资源允许的范围内，rlpyt可以让RL模型训练很快，但如果你的训练数据占用的资源远远超过了单机硬件的范围，那么就只能利用支持分布式训练的框架了，例如构建在<span style="background-color: rgb(255, 160, 122);"><a href="https://github.com/ray-project/ray" rel="noopener noreferrer" target="_blank">Ray</a></span>之上的框架<a href="https://ray.readthedocs.io/en/latest/rllib.html" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">RLlib</span></a>，又例如基于PaddlePaddle的<a href="https://github.com/PaddlePaddle/PARL" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">PARL</span></a>等。<br />
	这里值得一提的是，PARL号称它与RLlib进行了IMPALA算法下的<a href="https://www.jiqizhixin.com/articles/2019-04-28-5" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">对比测试</span></a>，其数据吞吐量（同等计算资源下的数据收集速度）足以吊打RLlib，所以PARL看起来是一个有前途的框架。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%bc%ba%e5%8c%96%e5%ad%a6%e4%b9%a0%e6%a1%86%e6%9e%b6-rlpyt-%e5%b9%b6%e8%a1%8cparallelism%e5%8e%9f%e7%90%86%e5%88%9d%e6%8e%a2/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
