<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>原创 &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/category/original/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Sat, 20 Jun 2026 06:25:38 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>[原创] 结合大模型(LLM)的函数搜索器 FunctionEvolve 简单实测</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%bb%93%e5%90%88%e5%a4%a7%e6%a8%a1%e5%9e%8bllm%e7%9a%84%e5%87%bd%e6%95%b0%e6%90%9c%e7%b4%a2%e5%99%a8-functionevolve-%e7%ae%80%e5%8d%95%e5%ae%9e%e6%b5%8b/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%bb%93%e5%90%88%e5%a4%a7%e6%a8%a1%e5%9e%8bllm%e7%9a%84%e5%87%bd%e6%95%b0%e6%90%9c%e7%b4%a2%e5%99%a8-functionevolve-%e7%ae%80%e5%8d%95%e5%ae%9e%e6%b5%8b/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sat, 20 Jun 2026 06:20:54 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[1stOpt]]></category>
		<category><![CDATA[FunctionEvolve]]></category>
		<category><![CDATA[Symbolic Regression]]></category>
		<category><![CDATA[公式拟合]]></category>
		<category><![CDATA[数值优化]]></category>
		<category><![CDATA[最优化]]></category>
		<category><![CDATA[符号回归搜索]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14229</guid>

					<description><![CDATA[<blockquote>
<p>
		<strong>系统</strong>：MacOS</p>
<p>
		<strong>Python版本</strong>：3.12</p>
<p>
		<strong>测试时间</strong>：2026-06-19</p>
</blockquote>
<p>
	不知道大家有没有做过&#34;从数据里猜测其符合什么样的y=f(x)数学公式&#34;这样的事情？在很多年前没有AI的时代，这个领域基本是欧美软件的天下，后来出现了一个石破天惊的国产软件1stOpt( <a href="http://www.7d-soft.com/">http://www.7d-soft.com/</a> )打破了它们的垄断，在当年算是国产软件在这个领域取得的重大成就。 巧了，这两天我刚好看到科技媒体报道了一个新的国产开源软件 <strong>FunctionEvolve</strong> ，正是和解决此类问题有关，于是我去了解了一下，并且拿它做了一些简单的测试，写成此文。<br />
	本站的关联文章链接：<a href="https://www.codelast.com/?p=7364" target="_blank">最优化/Optimization文章合集</a></p>
<p>
<span id="more-14229"></span>	<br />
	在正文开始之前我想先闲聊一下，文中测试使用的大模型是DeepSeek V4 Flash，测试过程中总共用了30多万token，花费0.19元，由于任务的性质缓存命中率很低，但DS定价太便宜所以几乎没花钱。</p>
<p>
	下面开始正文。</p>
<p>
	<strong>FunctionEvolve</strong> （ <a href="https://github.com/Phoinikas03/FunctionEvolve">https://github.com/Phoinikas03/FunctionEvolve</a> ）是一个符号回归（Symbolic Regression）搜索框架，核心目标是从数值数据中自动发现数学公式。它采用 <strong>LLM + 传统数值优化</strong> 的混合架构：LLM 负责分析领域知识、生成种子公式、选择父本和定向变异建议；传统优化器（DE / CMA-ES / L-BFGS-B / TRF）负责拟合公式中的参数。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<h3 id="-">
	同类软件一览</h3>
<p>
	符号回归和自动公式发现领域已有不少成熟工具，按原理大致可分为两类：</p>
<table>
<thead>
<tr>
<th>
				类别</th>
<th>
				代表软件</th>
<th>
				国别</th>
<th>
				核心技术</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>传统全局优化</strong></td>
<td>
				<strong>1stOpt</strong></td>
<td>
				🇨🇳 国产</td>
<td>
				UGO（通用全局优化），无需初值</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				MATLAB Curve Fitting</td>
<td>
				🇺🇸</td>
<td>
				Trust-Region / Levenberg-Marquardt</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				OriginPro</td>
<td>
				🇺🇸</td>
<td>
				Levenberg-Marquardt / 全局优化</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				DataFit</td>
<td>
				🇺🇸</td>
<td>
				多算法自动搜索</td>
</tr>
<tr>
<td>
				<strong>遗传编程 (GP)</strong></td>
<td>
				Eureqa</td>
<td>
				🇺🇸</td>
<td>
				遗传编程符号回归（最早商业化的 GP 工具）</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				PySR</td>
<td>
				🌍 开源</td>
<td>
				遗传编程（Python 库）</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				GPTIPS</td>
<td>
				🇬🇧</td>
<td>
				多基因遗传编程（MATLAB）</td>
</tr>
<tr>
<td>
				&#160;</td>
<td>
				gplearn</td>
<td>
				🌍 开源</td>
<td>
				遗传编程（Python 库）</td>
</tr>
<tr>
<td>
				<strong>LLM + 优化</strong></td>
<td>
				<strong>FunctionEvolve</strong> 🌟</td>
<td>
				🌍 开源</td>
<td>
				LLM 指导搜索方向 + 数值优化器拟合参数</td>
</tr>
</tbody>
</table>
<p>
	<strong>重点说明：1stOpt（七维高科）</strong></p>
<p>
	我很多年以前用过几次 1stOpt，当时对它留下了深刻印象（不知道现在是否还在维护），所以这里单独介绍一下。</p>
<p>
	1stOpt 是国内 7D-Soft 开发的数值优化分析软件，在工程和科研领域有大量用户。它的核心特点是：</p>
<ul>
<li>
		<strong>无需人工初值</strong> &#8212; 传统工具（如 MATLAB）做曲线拟合时，参数初值给不对就会发散或收敛到局部最优。1stOpt 内置 <strong>UGO（通用全局优化）</strong> 算法族，号称&#34;万用公式，任意初值&#34;。</li>
<li>
		<strong>公式模板驱动</strong> &#8212; 用户提供公式骨架（如 <code>y = a*x^b + c*e^(d*x)</code>），1stOpt 负责搜索最优参数。和 FunctionEvolve 相比，1stOpt 的公式结构需要<strong>人先指定</strong>，FunctionEvolve 则连结构也可以<strong>自动发现</strong>。</li>
<li>
		<strong>适用场景</strong> &#8212; 曲线/曲面拟合、非线性回归、参数估计、微分方程求解、工程反演。</li>
<li>
		<strong>局限</strong> &#8212; 商业软件、闭源、Windows 独占。公式结构依赖用户经验，无法自动探索未知结构。</li>
</ul>
<h2 id="-functionevolve-">
	在 FunctionEvolve 中，大模型起什么作用</h2>
<p>
	<strong>LLM 起的是&#34;搜索方向指导&#34;作用，数学计算由传统优化器完成。</strong></p>
<p>
	FunctionEvolve 是 <strong>LLM + 传统数值优化 的混合架构</strong>，LLM 不做数学计算。</p>
<hr />
<h2 id="-">
	完整工作流</h2>
<table>
<thead>
<tr>
<th>
				阶段</th>
<th>
				组件</th>
<th>
				是否用 LLM</th>
<th>
				作用</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>① 领域分析</strong></td>
<td>
				<code>Generator</code></td>
<td>
				✅ LLM</td>
<td>
				分析问题属于什么领域（物理/化学/生物...），给出该领域常见公式模板</td></tr></tbody></table>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%bb%93%e5%90%88%e5%a4%a7%e6%a8%a1%e5%9e%8bllm%e7%9a%84%e5%87%bd%e6%95%b0%e6%90%9c%e7%b4%a2%e5%99%a8-functionevolve-%e7%ae%80%e5%8d%95%e5%ae%9e%e6%b5%8b/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<blockquote>
<p>
		<strong>系统</strong>：MacOS</p>
<p>
		<strong>Python版本</strong>：3.12</p>
<p>
		<strong>测试时间</strong>：2026-06-19</p>
</blockquote>
<p>
	不知道大家有没有做过&quot;从数据里猜测其符合什么样的y=f(x)数学公式&quot;这样的事情？在很多年前没有AI的时代，这个领域基本是欧美软件的天下，后来出现了一个石破天惊的国产软件1stOpt( <a href="http://www.7d-soft.com/">http://www.7d-soft.com/</a> )打破了它们的垄断，在当年算是国产软件在这个领域取得的重大成就。 巧了，这两天我刚好看到科技媒体报道了一个新的国产开源软件 <strong>FunctionEvolve</strong> ，正是和解决此类问题有关，于是我去了解了一下，并且拿它做了一些简单的测试，写成此文。<br />
	本站的关联文章链接：<a href="https://www.codelast.com/?p=7364" target="_blank">最优化/Optimization文章合集</a></p>
<p>
<span id="more-14229"></span>	<br />
	在正文开始之前我想先闲聊一下，文中测试使用的大模型是DeepSeek V4 Flash，测试过程中总共用了30多万token，花费0.19元，由于任务的性质缓存命中率很低，但DS定价太便宜所以几乎没花钱。</p>
<p>
	下面开始正文。</p>
<p>
	<strong>FunctionEvolve</strong> （ <a href="https://github.com/Phoinikas03/FunctionEvolve">https://github.com/Phoinikas03/FunctionEvolve</a> ）是一个符号回归（Symbolic Regression）搜索框架，核心目标是从数值数据中自动发现数学公式。它采用 <strong>LLM + 传统数值优化</strong> 的混合架构：LLM 负责分析领域知识、生成种子公式、选择父本和定向变异建议；传统优化器（DE / CMA-ES / L-BFGS-B / TRF）负责拟合公式中的参数。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<h3 id="-">
	同类软件一览</h3>
<p>
	符号回归和自动公式发现领域已有不少成熟工具，按原理大致可分为两类：</p>
<table>
<thead>
<tr>
<th>
				类别</th>
<th>
				代表软件</th>
<th>
				国别</th>
<th>
				核心技术</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>传统全局优化</strong></td>
<td>
				<strong>1stOpt</strong></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1e8-1f1f3.png" alt="🇨🇳" class="wp-smiley" style="height: 1em; max-height: 1em;" /> 国产</td>
<td>
				UGO（通用全局优化），无需初值</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				MATLAB Curve Fitting</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				Trust-Region / Levenberg-Marquardt</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				OriginPro</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				Levenberg-Marquardt / 全局优化</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				DataFit</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				多算法自动搜索</td>
</tr>
<tr>
<td>
				<strong>遗传编程 (GP)</strong></td>
<td>
				Eureqa</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1fa-1f1f8.png" alt="🇺🇸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				遗传编程符号回归（最早商业化的 GP 工具）</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				PySR</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f30d.png" alt="🌍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> 开源</td>
<td>
				遗传编程（Python 库）</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				GPTIPS</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f1ec-1f1e7.png" alt="🇬🇧" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				多基因遗传编程（MATLAB）</td>
</tr>
<tr>
<td>
				&nbsp;</td>
<td>
				gplearn</td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f30d.png" alt="🌍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> 开源</td>
<td>
				遗传编程（Python 库）</td>
</tr>
<tr>
<td>
				<strong>LLM + 优化</strong></td>
<td>
				<strong>FunctionEvolve</strong> <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f30d.png" alt="🌍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> 开源</td>
<td>
				LLM 指导搜索方向 + 数值优化器拟合参数</td>
</tr>
</tbody>
</table>
<p>
	<strong>重点说明：1stOpt（七维高科）</strong></p>
<p>
	我很多年以前用过几次 1stOpt，当时对它留下了深刻印象（不知道现在是否还在维护），所以这里单独介绍一下。</p>
<p>
	1stOpt 是国内 7D-Soft 开发的数值优化分析软件，在工程和科研领域有大量用户。它的核心特点是：</p>
<ul>
<li>
		<strong>无需人工初值</strong> &mdash; 传统工具（如 MATLAB）做曲线拟合时，参数初值给不对就会发散或收敛到局部最优。1stOpt 内置 <strong>UGO（通用全局优化）</strong> 算法族，号称&quot;万用公式，任意初值&quot;。</li>
<li>
		<strong>公式模板驱动</strong> &mdash; 用户提供公式骨架（如 <code>y = a*x^b + c*e^(d*x)</code>），1stOpt 负责搜索最优参数。和 FunctionEvolve 相比，1stOpt 的公式结构需要<strong>人先指定</strong>，FunctionEvolve 则连结构也可以<strong>自动发现</strong>。</li>
<li>
		<strong>适用场景</strong> &mdash; 曲线/曲面拟合、非线性回归、参数估计、微分方程求解、工程反演。</li>
<li>
		<strong>局限</strong> &mdash; 商业软件、闭源、Windows 独占。公式结构依赖用户经验，无法自动探索未知结构。</li>
</ul>
<h2 id="-functionevolve-">
	在 FunctionEvolve 中，大模型起什么作用</h2>
<p>
	<strong>LLM 起的是&quot;搜索方向指导&quot;作用，数学计算由传统优化器完成。</strong></p>
<p>
	FunctionEvolve 是 <strong>LLM + 传统数值优化 的混合架构</strong>，LLM 不做数学计算。</p>
<hr />
<h2 id="-">
	完整工作流</h2>
<table>
<thead>
<tr>
<th>
				阶段</th>
<th>
				组件</th>
<th>
				是否用 LLM</th>
<th>
				作用</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>① 领域分析</strong></td>
<td>
				<code>Generator</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> LLM</td>
<td>
				分析问题属于什么领域（物理/化学/生物...），给出该领域常见公式模板</td>
</tr>
<tr>
<td>
				<strong>② 种子生成</strong></td>
<td>
				<code>Generator</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> LLM</td>
<td>
				基于领域知识，生成 20 个初始候选公式（带参数占位符 c0, c1...）</td>
</tr>
<tr>
<td>
				<strong>③ 参数拟合</strong></td>
<td>
				<code>StructureOptimizer</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>纯数值</strong></td>
<td>
				VARPRO + DE + CMA-ES + L-BFGS-B，拟合 c0, c1... 的最优值</td>
</tr>
<tr>
<td>
				<strong>④ 评估</strong></td>
<td>
				<code>Evaluator</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>纯数值</strong></td>
<td>
				计算 NMSE（归一化均方误差）</td>
</tr>
<tr>
<td>
				<strong>⑤ 选父本</strong></td>
<td>
				<code>Selector</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> LLM</td>
<td>
				看当前进化树所有公式的 NMSE 和结构，决定下一轮&quot;从哪些公式出发&quot;</td>
</tr>
<tr>
<td>
				<strong>⑥ 结构变异</strong></td>
<td>
				<code>ASTMutator</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>规则引擎</strong></td>
<td>
				程序化地删子树、加项、拆函数包装，生成候选变体</td>
</tr>
<tr>
<td>
				<strong>⑦ 结构建议</strong></td>
<td>
				<code>LLMMutator</code></td>
<td>
				<img src="https://s.w.org/images/core/emoji/17.0.2/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> LLM</td>
<td>
				&quot;试试加个 sin 项&quot;、&quot;试试把 x&sup2; 换成 exp(x)&quot;</td>
</tr>
<tr>
<td>
				<strong>⑧ 回到 ③</strong></td>
<td>
				循环</td>
<td>
				&mdash;</td>
<td>
				拟合 &rarr; 评估 &rarr; 选父本 &rarr; 变异 &rarr; 再拟合...</td>
</tr>
</tbody>
</table>
<hr />
<h2 id="-">
	优化算法详情</h2>
<p>
	所有参数拟合（阶段③）由 <code>StructureOptimizer</code> 执行，它是一个多策略流水线。</p>
<h3 id="-4-">
	第三方库提供的基础算法（4 个）</h3>
<table>
<thead>
<tr>
<th>
				算法</th>
<th>
				第三方来源</th>
<th>
				独立包装文件</th>
<th>
				StructureOptimizer 中的用途</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>L-BFGS-B</strong></td>
<td>
				<code>scipy.optimize.minimize</code></td>
<td>
				<code>optimizer/lbfgs.py</code></td>
<td>
				局部精修（全局搜索后的细化）；Pow 指数对齐后的重拟合</td>
</tr>
<tr>
<td>
				<strong>DE (差分进化)</strong></td>
<td>
				<code>scipy.optimize.differential_evolution</code></td>
<td>
				<code>optimizer/de.py</code></td>
<td>
				全局搜索主路径之一（<code>_run_de</code>）</td>
</tr>
<tr>
<td>
				<strong>CMA-ES</strong></td>
<td>
				<code>cmaes</code> 库</td>
<td>
				<code>optimizer/cma.py</code></td>
<td>
				全局搜索主路径之一（<code>_run_cma</code>）；DE 的并行兜底</td>
</tr>
<tr>
<td>
				<strong>TRF (Trust Region Reflective)</strong></td>
<td>
				<code>scipy.optimize.least_squares</code></td>
<td>
				<code>optimizer/least_squares.py</code></td>
<td>
				全局搜索路径之一（<code>_run_trf</code>）</td>
</tr>
</tbody>
</table>
<h3 id="-3-">
	项目自身实现的策略（3 个）</h3>
<table>
<thead>
<tr>
<th>
				策略</th>
<th>
				所在方法 (structure.py)</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>① VARPRO (变量投影分解)</strong></td>
<td>
				<code>_try_varpro</code></td>
<td>
				将参数分解为<strong>线性 + 非线性</strong>两组：线性组用 <code>np.linalg.lstsq</code> 直接 OLS 求解，非线性组用基础算法搜索。本质是降维策略</td>
</tr>
<tr>
<td>
				<strong>② Compound-Pow 预搜索</strong></td>
<td>
				<code>_try_pow_presearch</code></td>
<td>
				检测公式中 <code>x^(c)</code> 且底数含其他参数的情况，枚举候选有理指数（1, 2, 1/2, 1/3...），固定指数后用 VARPRO 拟合其余参数</td>
</tr>
<tr>
<td>
				<strong>③ Pow 指数对齐</strong></td>
<td>
				<code>_snap_pow_and_refit</code> + <code>_pow_rational_grid</code></td>
<td>
				将拟合出的浮点指数对齐到最接近的有理数/整数，再用 L-BFGS-B 重新拟合其他参数</td>
</tr>
</tbody>
</table>
<h3 id="structureoptimizer-">
	StructureOptimizer 执行流水线</h3>
<pre>
<code>输入: 公式骨架 + 数据
  │
  ├─ ① <span class="hljs-selector-tag">Pow</span> 预搜索 (Compound-Pow pre-search)
  │   枚举候选指数，固定后用 <span class="hljs-selector-tag">VARPRO</span> 拟合其余参数
  │
  ├─ ② <span class="hljs-selector-tag">VARPRO</span> 分解 (主路径)
  │   线性参数 &rarr; <span class="hljs-selector-tag">np</span><span class="hljs-selector-class">.linalg</span><span class="hljs-selector-class">.lstsq</span> <span class="hljs-selector-tag">OLS</span> 求解
  │   非线性参数 &rarr; <span class="hljs-selector-tag">DE</span> / <span class="hljs-selector-tag">CMA</span> / <span class="hljs-selector-tag">TRF</span> 并行搜索
  │
  ├─ ③ 并行兜底 (DE / CMA / TRF 同时跑, 取最优)
  │
  ├─ ④ <span class="hljs-selector-tag">L-BFGS-B</span> 局部精修
  │
  ├─ ⑤ <span class="hljs-selector-tag">Pow</span> 指数对齐 <span class="hljs-selector-tag">&amp;</span> 重拟合
  │
  └─ 输出: 最优参数 + <span class="hljs-selector-tag">NMSE</span>
</code></pre>
<p>
	所有路径均有<strong>超时保护</strong>和<strong>多次重启</strong>机制。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<h2 id="-">
	关键源码位置</h2>
<table>
<thead>
<tr>
<th>
				组件</th>
<th>
				文件</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				主入口 &amp; 装配</td>
<td>
				<code>main.py</code></td>
</tr>
<tr>
<td>
				搜索循环</td>
<td>
				<code>src/search.py</code>（<code>TreeSearch.run()</code>）</td>
</tr>
<tr>
<td>
				LLM 生成器</td>
<td>
				<code>src/generator.py</code></td>
</tr>
<tr>
<td>
				LLM 选择器</td>
<td>
				<code>src/selector.py</code></td>
</tr>
<tr>
<td>
				LLM 变异器</td>
<td>
				<code>src/mutator.py</code></td>
</tr>
<tr>
<td>
				AST 规则变异器</td>
<td>
				<code>src/mutator.py</code>（<code>ASTMutator</code> 类）</td>
</tr>
<tr>
<td>
				参数优化器总管</td>
<td>
				<code>src/optimizer/structure.py</code>（<code>StructureOptimizer</code>）</td>
</tr>
<tr>
<td>
				L-BFGS-B 优化器</td>
<td>
				<code>src/optimizer/lbfgs.py</code></td>
</tr>
<tr>
<td>
				DE 优化器</td>
<td>
				<code>src/optimizer/de.py</code></td>
</tr>
<tr>
<td>
				CMA-ES 优化器</td>
<td>
				<code>src/optimizer/cma.py</code></td>
</tr>
<tr>
<td>
				Least-Squares 优化器</td>
<td>
				<code>src/optimizer/least_squares.py</code></td>
</tr>
<tr>
<td>
				优化器基类 &amp; 工具</td>
<td>
				<code>src/optimizer/base.py</code></td>
</tr>
<tr>
<td>
				评估器</td>
<td>
				<code>src/evaluator.py</code></td>
</tr>
<tr>
<td>
				进化树</td>
<td>
				<code>src/evolution_tree.py</code></td>
</tr>
<tr>
<td>
				LLM 配置</td>
<td>
				<code>llm_config.yaml</code></td>
</tr>
</tbody>
</table>
<h2 id="-">
	一句话总结</h2>
<blockquote>
<p>
		<strong>LLM 是&quot;策略师&quot;&mdash;&mdash;告诉你该往哪个方向找公式；数学优化器是&quot;计算器&quot;&mdash;&mdash;实际算出参数值。</strong></p>
<p>
		LLM 不碰任何数值计算。所有参数拟合由 <code>StructureOptimizer</code> 以 <strong>VARPRO &rarr; DE/CMA/TRF（并行全局搜索）&rarr; L-BFGS-B（局部精修）&rarr; Pow 指数对齐</strong> 的多层流水线完成。</p>
</blockquote>
<hr />
<h2 id="-">
	完整测试流程</h2>
<blockquote>
<p>
		以下是从零开始创建测试数据、搭建环境、运行测试、清理环境的完整步骤记录。&nbsp;</p>
</blockquote>
<h3 id="-">
	测试目标</h3>
<p>
	验证 FunctionEvolve 能否从数值数据中自动发现复杂公式：</p>
<p>
	 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_27bbc1f7dc1ef282b12305a505028381.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = \frac{\sin(x_1 x_2)}{\cos(x_3) + 1.5} + e^{-x_2} \cdot \ln(|x_4| + 1) + \frac{x_1 x_3}{1 + x_5^2}" /></span><script type='math/tex'>y = \frac{\sin(x_1 x_2)}{\cos(x_3) + 1.5} + e^{-x_2} \cdot \ln(|x_4| + 1) + \frac{x_1 x_3}{1 + x_5^2}</script> </p>
<h3 id="-">
	步骤一览</h3>
<pre>
<code><span class="hljs-keyword">Step</span> <span class="hljs-number">1</span>: 生成测试数据集
<span class="hljs-keyword">Step</span> <span class="hljs-number">2</span>: 创建临时 Python 环境并安装依赖
<span class="hljs-keyword">Step</span> <span class="hljs-number">3</span>: 编写测试脚本
<span class="hljs-keyword">Step</span> <span class="hljs-number">4</span>: 运行测试
<span class="hljs-keyword">Step</span> <span class="hljs-number">5</span>: 查看结果
<span class="hljs-keyword">Step</span> <span class="hljs-number">6</span>: 删除临时环境
</code></pre>
<hr />
<h3 id="step-1-">
	Step 1：生成测试数据集</h3>
<pre>
<code class="lang-bash"><span class="hljs-keyword">cd</span> /path/<span class="hljs-keyword">to</span>/FunctionEvolve
<span class="hljs-keyword">python3</span> datasets/<span class="hljs-keyword">z</span>/generate_complex_dataset.<span class="hljs-keyword">py</span> --noise-std <span class="hljs-number">0.02</span>
</code></pre>
<h4 id="-">
	生成程序完整源码</h4>
<p>
	保存为 <code>datasets/z/generate_complex_dataset.py</code>：</p>
<pre>
<code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-string">&quot;&quot;</span><span class="hljs-string">&quot;
生成一个复杂公式的符号回归数据集，用于验证 FunctionEvolve 的效果。

公式（5 个输入变量，混合三角函数、指数、对数、有理分式）:

    sin(x1 * x2)                              x1 * x3
y = ──────────── + exp(-x2) * ln(|x4| + 1) + ─────────
    cos(x3) + 1.5                             1 + x5&sup2;


输出:
  datasets/z/
    dataset.npz        &mdash; NumPy 压缩格式（直接用 from_arrays 加载）
    dataset.csv        &mdash; CSV 文本格式（肉眼查看用）
    ground_truth.txt   &mdash; 真实公式说明
    preview.txt        &mdash; 数据统计摘要
&quot;</span><span class="hljs-string">&quot;&quot;</span>

from __future__ <span class="hljs-built_in">import</span> annotations
<span class="hljs-built_in">import</span> argparse
from pathlib <span class="hljs-built_in">import</span> Path
<span class="hljs-built_in">import</span> numpy as np

<span class="hljs-attr">GROUND_TRUTH_EXPRESSION</span> = (
    <span class="hljs-string">&quot;sin(x1 * x2) / (cos(x3) + 1.5) + exp(-x2) * log(abs(x4) + 1) + x1 * x3 / (1 + x5**2)&quot;</span>
)

<span class="hljs-attr">GROUND_TRUTH_SYMBOLS</span> = [<span class="hljs-string">&quot;x1&quot;</span>, <span class="hljs-string">&quot;x2&quot;</span>, <span class="hljs-string">&quot;x3&quot;</span>, <span class="hljs-string">&quot;x4&quot;</span>, <span class="hljs-string">&quot;x5&quot;</span>]
<span class="hljs-attr">RNG</span> = np.random.RandomState(<span class="hljs-number">42</span>)


def _safe_log_abs(x):
    return np.log(np.abs(x) + <span class="hljs-number">1.0</span>)


def ground_truth_y(X):
    x1, x2, x3, x4, <span class="hljs-attr">x5</span> = X[:, <span class="hljs-number">0</span>], X[:, <span class="hljs-number">1</span>], X[:, <span class="hljs-number">2</span>], X[:, <span class="hljs-number">3</span>], X[:, <span class="hljs-number">4</span>]
    <span class="hljs-attr">term1</span> = np.sin(x1 * x2) / (np.cos(x3) + <span class="hljs-number">1.5</span>)
    <span class="hljs-attr">term2</span> = np.exp(-x2) * _safe_log_abs(x4)
    <span class="hljs-attr">term3</span> = x1 * x3 / (<span class="hljs-number">1.0</span> + x5**<span class="hljs-number">2</span>)
    return term1 + term2 + term3


def main():
    <span class="hljs-attr">parser</span> = argparse.ArgumentParser(<span class="hljs-attr">description=&quot;生成复杂符号回归数据集&quot;)</span>
    parser.add_argument(<span class="hljs-string">&quot;--n-train&quot;</span>, <span class="hljs-attr">type=int,</span> <span class="hljs-attr">default=2000)</span>
    parser.add_argument(<span class="hljs-string">&quot;--n-test&quot;</span>, <span class="hljs-attr">type=int,</span> <span class="hljs-attr">default=500)</span>
    parser.add_argument(<span class="hljs-string">&quot;--noise-std&quot;</span>, <span class="hljs-attr">type=float,</span> <span class="hljs-attr">default=0.0)</span>
    parser.add_argument(<span class="hljs-string">&quot;--output-dir&quot;</span>, <span class="hljs-attr">type=str,</span> <span class="hljs-attr">default=None)</span>
    <span class="hljs-attr">args</span> = parser.parse_args()

    <span class="hljs-attr">out_dir</span> = Path(args.output_dir) <span class="hljs-keyword">if</span> args.output_dir <span class="hljs-keyword">else</span> Path(__file__).resolve().parent
    out_dir.mkdir(<span class="hljs-attr">parents=True,</span> <span class="hljs-attr">exist_ok=True)</span>

    <span class="hljs-comment"># 生成训练集</span>
    <span class="hljs-attr">X_train</span> = RNG.uniform(<span class="hljs-attr">low=-3.0,</span> <span class="hljs-attr">high=3.0,</span> <span class="hljs-attr">size=(args.n_train,</span> <span class="hljs-number">5</span>))
    <span class="hljs-attr">y_train</span> = ground_truth_y(X_train)
    <span class="hljs-keyword">if</span> args.noise_std &gt; <span class="hljs-number">0</span>:
        y_train += RNG.normal(<span class="hljs-number">0</span>, args.noise_std, <span class="hljs-attr">size=args.n_train)</span>

    <span class="hljs-comment"># 生成测试集</span>
    <span class="hljs-attr">rng_test</span> = np.random.RandomState(<span class="hljs-number">2024</span>)
    <span class="hljs-attr">X_test</span> = rng_test.uniform(<span class="hljs-attr">low=-3.0,</span> <span class="hljs-attr">high=3.0,</span> <span class="hljs-attr">size=(args.n_test,</span> <span class="hljs-number">5</span>))
    <span class="hljs-attr">y_test</span> = ground_truth_y(X_test)
    <span class="hljs-keyword">if</span> args.noise_std &gt; <span class="hljs-number">0</span>:
        y_test += rng_test.normal(<span class="hljs-number">0</span>, args.noise_std, <span class="hljs-attr">size=args.n_test)</span>

    <span class="hljs-comment"># 保存 NumPy 格式</span>
    np.savez(out_dir / <span class="hljs-string">&quot;dataset.npz&quot;</span>,
             <span class="hljs-attr">X_train=X_train,</span> <span class="hljs-attr">y_train=y_train,</span>
             <span class="hljs-attr">X_test=X_test,</span> <span class="hljs-attr">y_test=y_test,</span>
             <span class="hljs-attr">expression=GROUND_TRUTH_EXPRESSION)</span>

    <span class="hljs-comment"># 保存 CSV</span>
    <span class="hljs-attr">header</span> = <span class="hljs-string">&quot;,&quot;</span>.join(GROUND_TRUTH_SYMBOLS) + <span class="hljs-string">&quot;,y&quot;</span>
    <span class="hljs-attr">rows</span> = np.column_stack([np.vstack([X_train, X_test]),
                            np.hstack([y_train, y_test])])
    <span class="hljs-keyword">with</span> open(out_dir / <span class="hljs-string">&quot;dataset.csv&quot;</span>, <span class="hljs-string">&quot;w&quot;</span>) as f:
        f.write(f<span class="hljs-string">&quot;# 真实公式: {GROUND_TRUTH_EXPRESSION}\n&quot;</span>)
        f.write(f<span class="hljs-string">&quot;# 噪声标准差: {args.noise_std}\n&quot;</span>)
        f.write(header + <span class="hljs-string">&quot;\n&quot;</span>)
        for row <span class="hljs-keyword">in</span> rows:
            f.write(<span class="hljs-string">&quot;,&quot;</span>.join(f<span class="hljs-string">&quot;{v:.8f}&quot;</span> for v <span class="hljs-keyword">in</span> row) + <span class="hljs-string">&quot;\n&quot;</span>)

    print(f<span class="hljs-string">&quot;数据集已保存到 {out_dir}&quot;</span>)


<span class="hljs-keyword">if</span> <span class="hljs-attr">__name__</span> == <span class="hljs-string">&quot;__main__&quot;</span>:
    main()
</code></pre>
<h4 id="-">
	生成的数据样例</h4>
<pre>
<code class="lang-csv"><span class="hljs-selector-tag">x1</span>,<span class="hljs-selector-tag">x2</span>,<span class="hljs-selector-tag">x3</span>,<span class="hljs-selector-tag">x4</span>,<span class="hljs-selector-tag">x5</span>,<span class="hljs-selector-tag">y</span>
<span class="hljs-selector-tag">-0</span><span class="hljs-selector-class">.75275929</span>,2<span class="hljs-selector-class">.70428584</span>,1<span class="hljs-selector-class">.39196365</span>,0<span class="hljs-selector-class">.59195091</span>,<span class="hljs-selector-tag">-2</span><span class="hljs-selector-class">.06388816</span>,<span class="hljs-selector-tag">-0</span><span class="hljs-selector-class">.73060157</span>
<span class="hljs-selector-tag">-2</span><span class="hljs-selector-class">.06403288</span>,<span class="hljs-selector-tag">-2</span><span class="hljs-selector-class">.65149833</span>,2<span class="hljs-selector-class">.19705687</span>,0<span class="hljs-selector-class">.60669007</span>,1<span class="hljs-selector-class">.24843547</span>,4<span class="hljs-selector-class">.13384154</span>
<span class="hljs-selector-tag">-2</span><span class="hljs-selector-class">.87649303</span>,2<span class="hljs-selector-class">.81945911</span>,1<span class="hljs-selector-class">.99465584</span>,<span class="hljs-selector-tag">-1</span><span class="hljs-selector-class">.72596534</span>,<span class="hljs-selector-tag">-1</span><span class="hljs-selector-class">.90905020</span>,<span class="hljs-selector-tag">-2</span><span class="hljs-selector-class">.05631804</span>
<span class="hljs-selector-tag">-1</span><span class="hljs-selector-class">.89957294</span>,<span class="hljs-selector-tag">-1</span><span class="hljs-selector-class">.17454654</span>,0<span class="hljs-selector-class">.14853859</span>,<span class="hljs-selector-tag">-0</span><span class="hljs-selector-class">.40832989</span>,<span class="hljs-selector-tag">-1</span><span class="hljs-selector-class">.25262516</span>,1<span class="hljs-selector-class">.29225609</span>
</code></pre>
<p>
	每行 6 列：x₁ ~ x₅ 为输入，y 为输出。</p>
<p>
	产出文件：</p>
<table>
<thead>
<tr>
<th>
				文件</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<code>datasets/z/dataset.npz</code></td>
<td>
				NumPy 压缩格式，供 <code>from_arrays()</code> 加载</td>
</tr>
<tr>
<td>
				<code>datasets/z/dataset.csv</code></td>
<td>
				CSV 文本格式，肉眼查看用</td>
</tr>
<tr>
<td>
				<code>datasets/z/ground_truth.txt</code></td>
<td>
				真实公式说明</td>
</tr>
<tr>
<td>
				<code>datasets/z/preview.txt</code></td>
<td>
				数据统计摘要</td>
</tr>
</tbody>
</table>
<h3 id="step-2-">
	Step 2：创建临时环境</h3>
<p>
	使用 <code>micromamba</code> 创建独立的 Python 3.12 环境，避免污染已有环境：</p>
<pre>
<code class="lang-bash"><span class="hljs-comment"># 创建</span>
micromamba create -n py312 python=<span class="hljs-number">3</span>.<span class="hljs-number">12</span> -y

<span class="hljs-comment"># 安装依赖</span>
micromamba run -n py312 pip3 <span class="hljs-keyword">install </span><span class="hljs-keyword">scipy </span>sympy <span class="hljs-keyword">scikit-learn </span>h5py \
    tiktoken <span class="hljs-keyword">json-repair </span>cmaes threadpoolctl
</code></pre>
<h3 id="step-3-">
	Step 3：编写测试脚本</h3>
<p>
	保存为 <code>datasets/z/run_test.py</code>：</p>
<pre>
<code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-string">&quot;&quot;</span><span class="hljs-string">&quot;
在 FunctionEvolve 上运行自定义复杂数据集（datasets/z/dataset.npz）的测试脚本。

用法:
  cd /path/to/FunctionEvolve

  # 退化模式（无 LLM，仅验证程序化变异 + 数值优化）
  python datasets/z/run_test.py --degenerated

  # 完整模式（使用 LLM，需要 llm_config.yaml）
  python datasets/z/run_test.py
&quot;</span><span class="hljs-string">&quot;&quot;</span>

<span class="hljs-built_in">import</span> argparse
<span class="hljs-built_in">import</span> os
<span class="hljs-built_in">import</span> sys
from pathlib <span class="hljs-built_in">import</span> Path

<span class="hljs-built_in">import</span> numpy as np

<span class="hljs-attr">PROJECT_ROOT</span> = Path(__file__).resolve().parents[<span class="hljs-number">2</span>]
sys.path.insert(<span class="hljs-number">0</span>, str(PROJECT_ROOT))

from src.dataset <span class="hljs-built_in">import</span> SRDataset
from src.evaluator <span class="hljs-built_in">import</span> Evaluator
from src.evolution_tree <span class="hljs-built_in">import</span> EvolutionTree
from src.search <span class="hljs-built_in">import</span> TreeSearch
from src.generator <span class="hljs-built_in">import</span> MockGenerator, create_generator
from src.selector <span class="hljs-built_in">import</span> MockSelector, create_selector
from src.mutator <span class="hljs-built_in">import</span> MockMutator, LLMMutator
from src.llm_client <span class="hljs-built_in">import</span> build_openai_client, LLMUsageLogger


def _load_dataset(data_dir):
    <span class="hljs-attr">npz_path</span> = data_dir / <span class="hljs-string">&quot;dataset.npz&quot;</span>
    <span class="hljs-attr">data</span> = np.load(npz_path)
    <span class="hljs-attr">expr</span> = str(data[<span class="hljs-string">&quot;expression&quot;</span>].item())
    return SRDataset.from_arrays(
        <span class="hljs-attr">X_train=data[&quot;X_train&quot;],</span> <span class="hljs-attr">y_train=data[&quot;y_train&quot;],</span>
        <span class="hljs-attr">X_test=data[&quot;X_test&quot;],</span> <span class="hljs-attr">y_test=data[&quot;y_test&quot;],</span>
        <span class="hljs-attr">symbols=[&quot;x1&quot;,</span> <span class="hljs-string">&quot;x2&quot;</span>, <span class="hljs-string">&quot;x3&quot;</span>, <span class="hljs-string">&quot;x4&quot;</span>, <span class="hljs-string">&quot;x5&quot;</span>],
        <span class="hljs-attr">symbol_descs=[&quot;v1&quot;,</span> <span class="hljs-string">&quot;v2&quot;</span>, <span class="hljs-string">&quot;v3&quot;</span>, <span class="hljs-string">&quot;v4&quot;</span>, <span class="hljs-string">&quot;v5&quot;</span>],
        <span class="hljs-attr">expression=expr,</span> <span class="hljs-attr">equation_name=&quot;complex_z&quot;,</span>
    )


def _evaluate_ground_truth(ds, evaluator):
    <span class="hljs-string">&quot;&quot;</span><span class="hljs-string">&quot;对真实公式做参数拟合，给出精度天花板。&quot;</span><span class="hljs-string">&quot;&quot;</span>
    <span class="hljs-built_in">import</span> sympy as sp
    <span class="hljs-attr">sympy_expr</span> = sp.sympify(ds.expression)
    <span class="hljs-attr">param_names</span> = sorted(
        {str(s) for s <span class="hljs-keyword">in</span> sympy_expr.free_symbols} - set(ds.feature_names))
    <span class="hljs-attr">result</span> = evaluator.evaluate_skeleton(sympy_expr, param_names)
    print(f<span class="hljs-string">&quot;[Ground Truth] Train NMSE = {result.train_nmse:.2e}&quot;</span>)
    print(f<span class="hljs-string">&quot;[Ground Truth] Test  NMSE = {result.test_nmse:.2e}&quot;</span>)


def main():
    <span class="hljs-attr">parser</span> = argparse.ArgumentParser()
    parser.add_argument(<span class="hljs-string">&quot;--degenerated&quot;</span>, <span class="hljs-attr">action=&quot;store_true&quot;)</span>
    parser.add_argument(<span class="hljs-string">&quot;--max-steps&quot;</span>, <span class="hljs-attr">type=int,</span> <span class="hljs-attr">default=30)</span>
    parser.add_argument(<span class="hljs-string">&quot;--n-seeds&quot;</span>, <span class="hljs-attr">type=int,</span> <span class="hljs-attr">default=20)</span>
    parser.add_argument(<span class="hljs-string">&quot;--candidate-num&quot;</span>, <span class="hljs-attr">type=int,</span> <span class="hljs-attr">default=5)</span>
    parser.add_argument(<span class="hljs-string">&quot;--timeout&quot;</span>, <span class="hljs-attr">type=float,</span> <span class="hljs-attr">default=120.0)</span>
    parser.add_argument(<span class="hljs-string">&quot;--llm-config&quot;</span>, <span class="hljs-attr">type=str,</span> <span class="hljs-attr">default=None)</span>
    parser.add_argument(<span class="hljs-string">&quot;--run-tag&quot;</span>, <span class="hljs-attr">type=str,</span> <span class="hljs-attr">default=&quot;z_test&quot;)</span>
    <span class="hljs-attr">args</span> = parser.parse_args()

    <span class="hljs-attr">data_dir</span> = Path(__file__).resolve().parent
    <span class="hljs-attr">ds</span> = _load_dataset(data_dir)
    print(f<span class="hljs-string">&quot;训练样本: {ds.X_train.shape[0]}, 测试样本: {ds.X_test.shape[0]}&quot;</span>)

    <span class="hljs-attr">evaluator</span> = Evaluator(
        <span class="hljs-attr">feature_names=ds.feature_names,</span> <span class="hljs-attr">X_train=ds.X_train,</span>
        <span class="hljs-attr">y_train=ds.y_train,</span> <span class="hljs-attr">X_test=ds.X_test,</span> <span class="hljs-attr">y_test=ds.y_test,</span>
        <span class="hljs-attr">timeout=args.timeout,</span>
    )
    _evaluate_ground_truth(ds, evaluator)

    <span class="hljs-comment"># 构造 LLM / Mock Agent</span>
    <span class="hljs-attr">llm_config</span> = None
    <span class="hljs-keyword">if</span> not args.degenerated:
        <span class="hljs-attr">cfg_path</span> = args.llm_config <span class="hljs-literal">or</span> str(PROJECT_ROOT / <span class="hljs-string">&quot;llm_config.yaml&quot;</span>)
        <span class="hljs-keyword">if</span> os.path.exists(cfg_path):
            <span class="hljs-built_in">import</span> yaml
            <span class="hljs-keyword">with</span> open(cfg_path) as f:
                <span class="hljs-attr">llm_config</span> = yaml.safe_load(f) <span class="hljs-literal">or</span> {}

    def _resolve(comp, field, <span class="hljs-attr">fallback=None):</span>
        return llm_config.get(comp, {}).get(field, fallback) <span class="hljs-keyword">if</span> llm_config <span class="hljs-keyword">else</span> fallback

    <span class="hljs-keyword">if</span> args.degenerated:
        <span class="hljs-attr">generator</span> = MockGenerator(<span class="hljs-attr">variables=ds.feature_names)</span>
        <span class="hljs-attr">selector</span> = MockSelector(<span class="hljs-attr">variables=ds.feature_names)</span>
        <span class="hljs-attr">llm_mutator</span> = MockMutator()
    <span class="hljs-keyword">else</span>:
        <span class="hljs-attr">gen</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;model&quot;</span>); <span class="hljs-attr">gen_url</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;base_url&quot;</span>)
        <span class="hljs-attr">gen_key</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;api_key&quot;</span>); <span class="hljs-attr">gen_mode</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;mode&quot;</span>, <span class="hljs-string">&quot;openai&quot;</span>)
        <span class="hljs-attr">gen_temp</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;temperature&quot;</span>, <span class="hljs-number">0.8</span>)
        <span class="hljs-attr">gen_tok</span> = _resolve(<span class="hljs-string">&quot;generator&quot;</span>, <span class="hljs-string">&quot;max_tokens&quot;</span>, <span class="hljs-number">128000</span>)
        <span class="hljs-attr">logger</span> = LLMUsageLogger(str(Path(<span class="hljs-string">&quot;logs&quot;</span>) / <span class="hljs-string">&quot;z_custom&quot;</span> / <span class="hljs-string">&quot;llm_usage.csv&quot;</span>))
        <span class="hljs-attr">generator</span> = create_generator(gen, gen_url, gen_key, gen_temp, gen_tok,
                                     <span class="hljs-attr">usage_logger=logger,</span> <span class="hljs-attr">llm_mode=gen_mode)</span>

        <span class="hljs-attr">sm</span> = _resolve(<span class="hljs-string">&quot;selector&quot;</span>, <span class="hljs-string">&quot;model&quot;</span>, gen); <span class="hljs-attr">su</span> = _resolve(<span class="hljs-string">&quot;selector&quot;</span>, <span class="hljs-string">&quot;base_url&quot;</span>, gen_url)
        <span class="hljs-attr">sk</span> = _resolve(<span class="hljs-string">&quot;selector&quot;</span>, <span class="hljs-string">&quot;api_key&quot;</span>, gen_key); <span class="hljs-attr">sd</span> = _resolve(<span class="hljs-string">&quot;selector&quot;</span>, <span class="hljs-string">&quot;mode&quot;</span>, gen_mode)
        <span class="hljs-attr">st</span> = _resolve(<span class="hljs-string">&quot;selector&quot;</span>, <span class="hljs-string">&quot;temperature&quot;</span>, <span class="hljs-number">0.3</span>)
        <span class="hljs-attr">selector</span> = create_selector(sm, su, sk, st, <span class="hljs-attr">usage_logger=logger,</span> <span class="hljs-attr">llm_mode=sd)</span>

        <span class="hljs-attr">mm</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;model&quot;</span>, gen); <span class="hljs-attr">mu</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;base_url&quot;</span>, gen_url)
        <span class="hljs-attr">mk</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;api_key&quot;</span>, gen_key); <span class="hljs-attr">md</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;mode&quot;</span>, gen_mode)
        <span class="hljs-attr">mt</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;temperature&quot;</span>, <span class="hljs-number">0.7</span>)
        <span class="hljs-attr">m_tok</span> = _resolve(<span class="hljs-string">&quot;mutator&quot;</span>, <span class="hljs-string">&quot;max_tokens&quot;</span>, <span class="hljs-number">128000</span>)
        <span class="hljs-attr">client</span> = build_openai_client(mm, mu, <span class="hljs-attr">mode=md,</span> <span class="hljs-attr">api_key=mk)</span>
        <span class="hljs-attr">llm_mutator</span> = LLMMutator(client, mm, mt, m_tok, <span class="hljs-attr">usage_logger=logger)</span>

    <span class="hljs-attr">tree</span> = EvolutionTree()
    <span class="hljs-attr">searcher</span> = TreeSearch(
        <span class="hljs-attr">dataset=ds,</span> <span class="hljs-attr">evaluator=evaluator,</span> <span class="hljs-attr">tree=tree,</span>
        <span class="hljs-attr">selector=selector,</span> <span class="hljs-attr">generator=generator,</span> <span class="hljs-attr">llm_mutator=llm_mutator,</span>
        <span class="hljs-attr">max_steps=args.max_steps,</span> <span class="hljs-attr">n_seeds=args.n_seeds,</span>
        <span class="hljs-attr">candidate_num=args.candidate_num,</span> <span class="hljs-attr">timeout=args.timeout,</span> <span class="hljs-attr">verbose=True,</span>
    )
    searcher.initialize_seeds()
    searcher.run()

    <span class="hljs-comment"># 输出最佳结果</span>
    <span class="hljs-attr">evaluated</span> = [n for n <span class="hljs-keyword">in</span> tree.all_nodes <span class="hljs-keyword">if</span> n.is_evaluated <span class="hljs-literal">and</span> n.train_nmse &lt; float(<span class="hljs-string">&quot;inf&quot;</span>)]
    evaluated.sort(<span class="hljs-attr">key=lambda</span> n: n.train_nmse)
    <span class="hljs-keyword">if</span> evaluated:
        <span class="hljs-attr">best</span> = evaluated[<span class="hljs-number">0</span>]
        print(f<span class="hljs-string">&quot;\n最佳公式: {best.skeleton_str}&quot;</span>)
        print(f<span class="hljs-string">&quot;训练集 NMSE: {best.train_nmse:.4e}&quot;</span>)
        print(f<span class="hljs-string">&quot;测试集 NMSE: {best.test_nmse:.4e}&quot;</span>)


<span class="hljs-keyword">if</span> <span class="hljs-attr">__name__</span> == <span class="hljs-string">&quot;__main__&quot;</span>:
    main()
</code></pre>
<h3 id="step-4-">
	Step 4：运行测试</h3>
<p>
	共运行三个实验：</p>
<p>
	<strong>① 退化模式（1 步）：</strong></p>
<pre>
<code class="lang-bash"><span class="hljs-comment">cd</span> <span class="hljs-comment">/path/to/FunctionEvolve</span>
<span class="hljs-comment">python3</span> <span class="hljs-comment">datasets/z/run_test</span><span class="hljs-string">.</span><span class="hljs-comment">py</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">degenerated</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">max</span><span class="hljs-literal">-</span><span class="hljs-comment">steps</span> <span class="hljs-comment">1</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">n</span><span class="hljs-literal">-</span><span class="hljs-comment">seeds</span> <span class="hljs-comment">3</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">candidate</span><span class="hljs-literal">-</span><span class="hljs-comment">num</span> <span class="hljs-comment">2</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">timeout</span> <span class="hljs-comment">120</span>
</code></pre>
<p>
	<strong>② LLM 模式（1 步）：</strong></p>
<pre>
<code class="lang-bash"><span class="hljs-keyword">cd</span> /path/<span class="hljs-keyword">to</span>/FunctionEvolve
<span class="hljs-keyword">python3</span> datasets/<span class="hljs-keyword">z</span>/run_test.<span class="hljs-keyword">py</span> --<span class="hljs-built_in">max</span>-steps <span class="hljs-number">1</span> --n-seeds <span class="hljs-number">3</span> --candidate-num <span class="hljs-number">2</span> --timeout <span class="hljs-number">120</span>
</code></pre>
<p>
	<strong>③ LLM 模式（3 步）：</strong></p>
<pre>
<code class="lang-bash"><span class="hljs-keyword">cd</span> /path/<span class="hljs-keyword">to</span>/FunctionEvolve
<span class="hljs-keyword">python3</span> datasets/<span class="hljs-keyword">z</span>/run_test.<span class="hljs-keyword">py</span> --<span class="hljs-built_in">max</span>-steps <span class="hljs-number">3</span> --n-seeds <span class="hljs-number">3</span> --candidate-num <span class="hljs-number">2</span> --timeout <span class="hljs-number">120</span>
</code></pre>
<blockquote>
<p>
		为加快运行速度，上述实验使用了 200 训练样本 + 100 测试样本的子集（完整数据集为 2000 + 500）。</p>
</blockquote>
<h3 id="step-5-">
	Step 5：测试输出及分析</h3>
<pre>
<code><span class="hljs-string">[Ground Truth]</span> 训练集 NMSE = <span class="hljs-number">1</span>.77e-<span class="hljs-number">05</span>
<span class="hljs-string">[Ground Truth]</span> 测试集 NMSE = <span class="hljs-number">2</span>.14e-<span class="hljs-number">05</span>
</code></pre>
<p>
	真实公式本身不需要拟合任何参数，NMSE &asymp; 1.8e-5 来自 0.02 标准差的高斯噪声，是本次测试的精度天花板。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<h4 id="-a-200-">
	模式 A：退化模式结果（200 样本）</h4>
<p>
	初始种子（退化模式仅基于 x1 生成简易多项式/指数公式）：</p>
<table>
<thead>
<tr>
<th>
				种子公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_014672d406888f94e9d3bfd97478d21b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 x_1 + c_1" /></span><script type='math/tex'>y = c_0 x_1 + c_1</script> </td>
<td>
				0.998</td>
<td>
				0.992</td>
<td>
				线性</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_92f83d155bc22a983410fda0f01ec234.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 x_1^2 + c_1 x_1 + c_2" /></span><script type='math/tex'>y = c_0 x_1^2 + c_1 x_1 + c_2</script> </td>
<td>
				0.998</td>
<td>
				0.991</td>
<td>
				二次多项式</td>
</tr>
</tbody>
</table>
<p>
	最优变异后（1 步，2 父本，111 个候选）：</p>
<table>
<thead>
<tr>
<th>
				公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_946a2a9b94d33b464cf5b290f803a2bb.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = \frac{c_0 x_1 + c_1}{c_2 e^{c_3 x_2} + 1}" /></span><script type='math/tex'>y = \frac{c_0 x_1 + c_1}{c_2 e^{c_3 x_2} + 1}</script> </td>
<td>
				<strong>0.270</strong></td>
<td>
				<strong>0.319</strong></td>
<td>
				分式 + 指数，引入 x₂</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_c018833bc177671a6392dfce09c6e2b1.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = (c_0 + c_1 x_1) e^{c_2 x_2}" /></span><script type='math/tex'>y = (c_0 + c_1 x_1) e^{c_2 x_2}</script> </td>
<td>
				0.271</td>
<td>
				0.303</td>
<td>
				线性 &times; 指数</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_6c484efca940a461ff28273aa68f18b7.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 x_1 + c_1 + c_2 e^{c_3 x_2}" /></span><script type='math/tex'>y = c_0 x_1 + c_1 + c_2 e^{c_3 x_2}</script> </td>
<td>
				0.283</td>
<td>
				0.344</td>
<td>
				线性 + 指数</td>
</tr>
</tbody>
</table>
<p>
	<strong>退化模式瓶颈</strong>：种子只使用 x1，NMSE approx 1.0；一轮变异后最优 NMSE 0.270（解释 ~73% 方差）。但缺乏 LLM 指引方向，变异盲目枚举（111 个候选中仅少数引入 x2，无法发现 sin 等函数）。</p>
<h4 id="-b-deepseek-200-">
	模式 B：完整模式结果（使用 DeepSeek，200 样本）</h4>
<p>
	<strong>1. LLM Generator 领域分析</strong></p>
<p>
	LLM 成功分析出数据集属于&quot;通用符号回归 / 数学建模&quot;领域，输出了 20+ 个经典公式模式供种子生成参考。</p>
<p>
	<strong>2. LLM 种子生成</strong></p>
<table>
<thead>
<tr>
<th>
				种子公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_178eecbf529ffa520f98b76afb7a69d3.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 + c_1 x_2 + c_2 x_3 + c_3 x_4 + c_4 x_5" /></span><script type='math/tex'>y = c_0 + c_1 x_2 + c_2 x_3 + c_3 x_4 + c_4 x_5</script> </td>
<td>
				~0.89</td>
<td>
				~0.89</td>
<td>
				LLM 尝试全部 5 个变量</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_84bd8311a8509c4584be9e5a1f1537f3.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 x_2^{c_1} x_3^{c_2}" /></span><script type='math/tex'>y = c_0 x_2^{c_1} x_3^{c_2}</script> </td>
<td>
				~0.84</td>
<td>
				~0.84</td>
<td>
				LLM 聚焦 x₂, x₃</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cf66b26a432f33f6936b314431517e67.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4)" /></span><script type='math/tex'>y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4)</script> </td>
<td>
				<strong>~0.35</strong></td>
<td>
				<strong>~0.34</strong></td>
<td>
				LLM 直接猜中 sin + exp 结构！</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_b365f6ffe6e05aa663ea8fa1056862fa.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 x_2^2 x_3^{c_1}" /></span><script type='math/tex'>y = c_0 x_2^2 x_3^{c_1}</script> （退化兜底）</td>
<td>
				~0.84</td>
<td>
				~0.84</td>
<td>
				&mdash;</td>
</tr>
</tbody>
</table>
<p>
	LLM 的第三个种子 c0<em>exp(c1</em>x2) + c2<em>sin(c3</em>x3 + c4) 已经包含了 <strong>exp + sin</strong>，NMSE 仅 ~0.35，远超退化种子的 ~0.998。</p>
<p>
	<strong>3. LLM Mutator 变异建议</strong></p>
<p>
	LLM 为选中的父本生成了 <strong>20 条定向变异建议</strong>（退化模式中 ASTMutator 靠穷举生成了约 111 条），例如：&quot;尝试多项式展开&quot;、&quot;添加指数衰减项 exp&quot;、&quot;尝试分式结构&quot;、&quot;添加正弦/余弦项&quot;。</p>
<p>
	<strong>4. 各步最优公式</strong></p>
<p>
	LLM 1 步后：</p>
<table>
<thead>
<tr>
<th>
				公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_a42775af54ff5c60746fda6e0a9c2a63.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 \sin(c_6 x_4 + c_7)" /></span><script type='math/tex'>y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 \sin(c_6 x_4 + c_7)</script> </td>
<td>
				<strong>0.224</strong></td>
<td>
				<strong>0.300</strong></td>
<td>
				双 sin + exp，引入 x₂~x₄</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_f7d4e21a85530238af9e2f6b8a39c528.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 x_4^{c_6}" /></span><script type='math/tex'>y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 x_4^{c_6}</script> </td>
<td>
				0.228</td>
<td>
				0.301</td>
<td>
				幂次替代 sin</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_3538fef286c5b856a86752e27d2484a6.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 e^{c_1 x_2} + c_2 x_3 (c_3 + x_1)^{c_4} + c_5 \sin(c_6 x_3 + c_7)" /></span><script type='math/tex'>y = c_0 e^{c_1 x_2} + c_2 x_3 (c_3 + x_1)^{c_4} + c_5 \sin(c_6 x_3 + c_7)</script> </td>
<td>
				0.241</td>
<td>
				0.282</td>
<td>
				引入 x₁</td>
</tr>
</tbody>
</table>
<p>
	LLM 3 步后：</p>
<table>
<thead>
<tr>
<th>
				公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_cec39a90f642189bb2921c23e4542f78.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\frac{c_0 x_1 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}" /></span><script type='math/tex'>\frac{c_0 x_1 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}</script> </td>
<td>
				<strong>0.081</strong></td>
<td>
				<strong>0.129</strong></td>
<td>
				复合分式 + sin + exp</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_39db71404ac9251c586c4ac7621b279c.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="c_0 x_3 (c_1 + x_1)^{c_2} + c_3 e^{c_4 x_2} + c_5 x_3 e^{c_4 x_2} + c_6 e^{c_4 x_2} \sin(c_7 x_4 + c_8)" /></span><script type='math/tex'>c_0 x_3 (c_1 + x_1)^{c_2} + c_3 e^{c_4 x_2} + c_5 x_3 e^{c_4 x_2} + c_6 e^{c_4 x_2} \sin(c_7 x_4 + c_8)</script> </td>
<td>
				0.088</td>
<td>
				0.110</td>
<td>
				分项式</td>
</tr>
<tr>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_c5f1003e513a65312749530663039ff5.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\frac{c_0 x_2 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}" /></span><script type='math/tex'>\frac{c_0 x_2 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}</script> </td>
<td>
				0.099</td>
<td>
				0.157</td>
<td>
				类似结构</td>
</tr>
</tbody>
</table>
<hr />
<p>
	<strong>两种模式找到的最优函数：</strong></p>
<table>
<thead>
<tr>
<th>
				实验</th>
<th>
				最优公式</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				<strong>真实公式</strong></td>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_957278b1d2df9c3fe6b7a55a527a1787.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = \frac{\sin(x_1 x_2)}{\cos(x_3) + 1.5} + e^{-x_2} \ln(\lvert x_4 \rvert + 1) + \frac{x_1 x_3}{1 + x_5^2}" /></span><script type='math/tex'>y = \frac{\sin(x_1 x_2)}{\cos(x_3) + 1.5} + e^{-x_2} \ln(\lvert x_4 \rvert + 1) + \frac{x_1 x_3}{1 + x_5^2}</script> </td>
<td>
				<strong>1.77e-5</strong></td>
<td>
				<strong>2.14e-5</strong></td>
</tr>
<tr>
<td>
				<strong>退化 1 步</strong></td>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_946a2a9b94d33b464cf5b290f803a2bb.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = \frac{c_0 x_1 + c_1}{c_2 e^{c_3 x_2} + 1}" /></span><script type='math/tex'>y = \frac{c_0 x_1 + c_1}{c_2 e^{c_3 x_2} + 1}</script> </td>
<td>
				<strong>0.270</strong></td>
<td>
				<strong>0.319</strong></td>
</tr>
<tr>
<td>
				<strong>LLM 1 步</strong></td>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_a42775af54ff5c60746fda6e0a9c2a63.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 \sin(c_6 x_4 + c_7)" /></span><script type='math/tex'>y = c_0 e^{c_1 x_2} + c_2 \sin(c_3 x_3 + c_4) + c_5 \sin(c_6 x_4 + c_7)</script> </td>
<td>
				<strong>0.224</strong></td>
<td>
				<strong>0.300</strong></td>
</tr>
<tr>
<td>
				<strong>LLM 3 步</strong></td>
<td>
				 <span class='MathJax_Preview'><img src='https://www.codelast.com/wp-content/plugins/latex/cache/tex_bbd2672a9dd8f87d3e93658d4a44e781.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="y = \frac{c_0 x_1 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}" /></span><script type='math/tex'>y = \frac{c_0 x_1 + c_1 + (c_2 x_3 + c_3)(c_4 + c_5 x_3 + c_6 \sin(c_7 x_4 + c_8)) e^{c_9 x_2}}{c_2 x_3 + c_3}</script> </td>
<td>
				<strong>0.081</strong></td>
<td>
				<strong>0.129</strong></td>
</tr>
</tbody>
</table>
<p>
	NMSE 对比（越低越好）：</p>
<table>
<thead>
<tr>
<th>
				模式</th>
<th>
				步数</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				相比退化改善</th>
<th>
				关键发现</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				真实公式</td>
<td>
				--</td>
<td>
				1.77e-5</td>
<td>
				2.14e-5</td>
<td>
				--</td>
<td>
				精度天花板</td>
</tr>
<tr>
<td>
				退化</td>
<td>
				1 步</td>
<td>
				0.270</td>
<td>
				0.319</td>
<td>
				基准</td>
<td>
				分式 + 指数（x₁, x₂）</td>
</tr>
<tr>
<td>
				LLM</td>
<td>
				1 步</td>
<td>
				<strong>0.224</strong></td>
<td>
				<strong>0.300</strong></td>
<td>
				&darr; 17%</td>
<td>
				发现 <strong>sin()</strong>，引入 x₃, x₄</td>
</tr>
<tr>
<td>
				LLM</td>
<td>
				<strong>3 步</strong></td>
<td>
				<strong>0.081</strong></td>
<td>
				<strong>0.129</strong></td>
<td>
				<strong>&darr; 70%</strong></td>
<td>
				复合分式 + sin + exp（x₁~x₄）</td>
</tr>
</tbody>
</table>
<blockquote>
<p>
		LLM 1 步内就发现了退化模式无法找到的 <strong>sin()</strong>。3 步后 NMSE 降到 <strong>0.081</strong>（解释 ~92% 方差），但 x₁&times;x₂ 的 sin 交互和 x₅ 尚未被完整识别。预计 LLM 模式 5~10 步可收敛到 NMSE &lt; 1e-3。</p>
</blockquote>
<h4 id="-">
	当前进度与收敛预估</h4>
<table>
<thead>
<tr>
<th>
				维度</th>
<th>
				值</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				变量数</td>
<td>
				5（x₁~x₅）</td>
</tr>
<tr>
<td>
				函数种类</td>
<td>
				sin, cos, exp, log, 除法（5 种）</td>
</tr>
<tr>
<td>
				交互关系</td>
<td>
				x₁&times;x₂, x₁&times;x₃, x₂（指数项）, x₄（对数项）, x₅（分式）</td>
</tr>
<tr>
<td>
				噪声</td>
<td>
				0.02 标准差</td>
</tr>
</tbody>
</table>
<p>
	当前实验进度（基于 200 训练样本 + 100 测试样本）：</p>
<table>
<thead>
<tr>
<th>
				实验</th>
<th>
				训练集 NMSE</th>
<th>
				测试集 NMSE</th>
<th>
				关键发现</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				退化 1 步</td>
<td>
				0.270</td>
<td>
				0.319</td>
<td>
				分式 + 指数，仅 x₁, x₂</td>
</tr>
<tr>
<td>
				LLM 1 步</td>
<td>
				0.224</td>
<td>
				0.300</td>
<td>
				发现 <strong>sin()</strong>，引入 x₃, x₄</td>
</tr>
<tr>
<td>
				LLM <strong>3 步</strong></td>
<td>
				<strong>0.081</strong></td>
<td>
				<strong>0.129</strong></td>
<td>
				复合分式 + sin + exp，引入 x₁~x₄</td>
</tr>
</tbody>
</table>
<p>
	LLM 3 步后 NMSE 0.081（解释 ~92% 方差），已成功发现 sin() 函数并引入了 x₁~x₄ 四个变量。但真实公式中 x₁&times;x₂ 的 sin 交互、x₁&times;x₃ 的分式交互以及 x₅ 的分式尚未被完整识别。预计：</p>
<ul>
<li>
		<strong>LLM 模式</strong>：5~10 步可收敛到 NMSE &lt; 1e-3</li>
<li>
		<strong>退化模式</strong>：相同步数下 NMSE 下降仅 ~18%/步，效率远低于 LLM</li>
</ul>
<p>
	这正是 FunctionEvolve 采用 LLM + 数值优化混合架构的原因&mdash;&mdash;<strong>LLM 提供方向，数值优化负责精确拟合</strong>。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<h3 id="step-6-">
	Step 6：清理临时环境</h3>
<pre>
<code class="lang-bash"><span class="hljs-comment"># 测试完成后删除临时环境</span>
<span class="hljs-attribute">micromamba</span> remove -n py312 --<span class="hljs-literal">all</span> -y
</code></pre>
<h3 id="-">
	总结</h3>
<table>
<thead>
<tr>
<th>
				项目</th>
<th>
				状态</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				测试数据集生成</td>
<td>
				完成</td>
</tr>
<tr>
<td>
				测试脚本</td>
<td>
				完成（<code>datasets/z/run_test.py</code>）</td>
</tr>
<tr>
<td>
				退化模式（1 步）</td>
<td>
				完成（NMSE 0.270 / 0.319，无 LLM）</td>
</tr>
<tr>
<td>
				LLM 模式（1 步）</td>
<td>
				完成（NMSE 0.224 / 0.300，发现 sin()）</td>
</tr>
<tr>
<td>
				<strong>LLM 模式（3 步）</strong></td>
<td>
				<strong>完成（NMSE 0.081 / 0.129，解释 ~92% 方差）</strong></td>
</tr>
<tr>
<td>
				搜索正确性验证</td>
<td>
				全流程无错误，所有实验数据完整记录</td>
</tr>
<tr>
<td>
				临时环境清理</td>
<td>
				已删除</td>
</tr>
</tbody>
</table>
<p>
	<strong>结论</strong>：FunctionEvolve 的搜索管道可以正常运行。LLM 模式显著优于退化模式&mdash;&mdash;1 步即可发现退化模式无法找到的 sin() 函数，3 步后 NMSE 降至 <strong>0.081</strong>。其实上面的测试还没有找到最终的正确公式，由于我电脑性能不强，仅仅运行上面的测试就已经花了很长时间+风扇狂转，因此测试只能止步于此。但可以看到，FunctionEvolve相较于传统优化方法是有较大优势的。</p>
<p>	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span> 版权声明 <span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%bb%93%e5%90%88%e5%a4%a7%e6%a8%a1%e5%9e%8bllm%e7%9a%84%e5%87%bd%e6%95%b0%e6%90%9c%e7%b4%a2%e5%99%a8-functionevolve-%e7%ae%80%e5%8d%95%e5%ae%9e%e6%b5%8b/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>批量复制历史日期的Hive表</title>
		<link>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/</link>
					<comments>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 25 Mar 2026 09:49:41 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[复制Hive表]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14221</guid>

					<description><![CDATA[<p>如果Hive表的分区是日期，并且每天数据不大，那么如果想批量把某一天的数据复制出多天的数据，最快的方法可能是直接复制磁盘文件，然后再用一行命令处理一下即可。<br />
<span id="more-14221"></span><br />
（1）先找到Hive表所在的HDFS目录，假设我们想用 2026-03-20 的数据复制出 2026-03-21 的数据，则：</p>
<blockquote>
<p>
		hadoop fs -cp /path/to/your/hive/table/hdfs/dir/date=2026-03-20&#160;/path/to/your/hive/table/hdfs/dir/date=2026-03-21</p>
</blockquote>
<p>（2）光复制目录没用，数据仍然是查询不到的，需要用在Hive命令行交互模式下，执行以下命令让复制出来的数据&#34;生效&#34;</p>
<blockquote>
<p>
		msck repair table 表名;</p>
</blockquote>
<p>该命令用于修复表的元数据。<br />
直接在 HDFS 上创建了分区目录，但未通过 ALTER TABLE ADD PARTITION 命令注册到 Hive 元数据中，运行msck命令后，这些分区会被自动发现并添加到元数据。&#8230; <a href="https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>如果Hive表的分区是日期，并且每天数据不大，那么如果想批量把某一天的数据复制出多天的数据，最快的方法可能是直接复制磁盘文件，然后再用一行命令处理一下即可。<br />
<span id="more-14221"></span><br />
（1）先找到Hive表所在的HDFS目录，假设我们想用 2026-03-20 的数据复制出 2026-03-21 的数据，则：</p>
<blockquote>
<p>
		hadoop fs -cp /path/to/your/hive/table/hdfs/dir/date=2026-03-20&nbsp;/path/to/your/hive/table/hdfs/dir/date=2026-03-21</p>
</blockquote>
<p>（2）光复制目录没用，数据仍然是查询不到的，需要用在Hive命令行交互模式下，执行以下命令让复制出来的数据&quot;生效&quot;</p>
<blockquote>
<p>
		msck repair table 表名;</p>
</blockquote>
<p>该命令用于修复表的元数据。<br />
直接在 HDFS 上创建了分区目录，但未通过 ALTER TABLE ADD PARTITION 命令注册到 Hive 元数据中，运行msck命令后，这些分区会被自动发现并添加到元数据。</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 美化git diff命令在终端的显示效果</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 09 Mar 2026 03:20:04 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[git diff]]></category>
		<category><![CDATA[git-delta]]></category>
		<category><![CDATA[左右双屏]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14209</guid>

					<description><![CDATA[<p>本文适用的环境：<br />
MacOS、Ubuntu（仅在20.04.6 LTS上做了测试）<br />
git diff命令在终端执行时，其显示效果是：<br />
* 按文件分块：每个有改动的文件单独一段，从上到下依次展示。<br />
* 统一在一个终端窗口内纵向滚动，默认不会并排显示左右对比。<br />
我个人觉得，这种显示方式，不如&#34;左右对比&#34;的diff形式直观。<br />
所以，有没有办法把git diff命令的输出，改造成更美观的形式呢？<br />
<span id="more-14209"></span><br />
在MacOS下，可以安装 git-delta，再稍加配置，就可以让终端里的 git diff&#160;命令显式美观得多。<br />
先看最终效果：<br />
<img decoding="async" alt="git diff" src="https://www.codelast.com/wp-content/uploads/2026/03/git_diff_style_change.jpg" style="width: 700px; height: 259px;" /><br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<p>这个效果是怎么实现的呢？按如下步骤操作&#8212;&#8212;<br />
以MacOS为例，安装：</p>
<blockquote>
<p>
		brew install git-delta</p>
</blockquote>
<p>全局配置到git，修改 ~/.gitconfig&#160;文件，添加如下内容：</p>
<blockquote>
<div>
		[core]</div>
<div>
		&#160; &#160; pager = delta</div>
<div>
		[interactive]</div>
<div>
		&#160; &#160; diffFilter = delta --color-only</div>
<div>
		[delta]</div>
<div>
		&#160; &#160; syntax-theme = Monokai Extended</div>
<div>
		&#160; &#160; line-numbers = true</div>
<div>
		&#160; &#160; side-by-side = true</div>
</blockquote>
<div>
	各配置项含义如下：
<div class="document">
<div class="section">
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[core] pager = delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">把 Git 的&#8220;分页器&#8221;改成 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响的命令：如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git diff</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">、</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git log -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;"> 等需要分页显示的输出</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">作用：这些命令的输出不再通过 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">less</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">，而是先经过 delta 进行美化后再显示</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[interactive] diffFilter = delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">专门给交互式操作（如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git add -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">）设置 diff 过滤器</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Git 在交互式展示每一块 diff 时，先把原始 diff 丢给 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">-color-only</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">：只加颜色高亮，不改行号、不改文本结构，确保交互命令正常工作</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] syntax-theme = Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">设置 delta 的语法高亮主题为 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响代码内容的配色风格（关键字、字符串、注释等的颜色方案）</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] line-numbers = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">在 delta 输出中展示行号</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">一般会在左侧或边栏显示老/新文件的行号，方便定位</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] side-by-side = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">使用左右并排对比模式显示 diff</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">左侧通常是旧版本，右侧是新版本，效果类似 GitHub PR 的对比视图</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
</div>
</div>
<p>
	其中，你可以用&#160;delta --list-syntax-themes&#160;命令查看所有内置主题，并设置到 syntax-theme&#160;配置项中。</p>
<p>	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>本文适用的环境：<br />
MacOS、Ubuntu（仅在20.04.6 LTS上做了测试）<br />
git diff命令在终端执行时，其显示效果是：<br />
* 按文件分块：每个有改动的文件单独一段，从上到下依次展示。<br />
* 统一在一个终端窗口内纵向滚动，默认不会并排显示左右对比。<br />
我个人觉得，这种显示方式，不如&quot;左右对比&quot;的diff形式直观。<br />
所以，有没有办法把git diff命令的输出，改造成更美观的形式呢？<br />
<span id="more-14209"></span><br />
在MacOS下，可以安装 git-delta，再稍加配置，就可以让终端里的 git diff&nbsp;命令显式美观得多。<br />
先看最终效果：<br />
<img decoding="async" alt="git diff" src="https://www.codelast.com/wp-content/uploads/2026/03/git_diff_style_change.jpg" style="width: 700px; height: 259px;" /><br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<p>这个效果是怎么实现的呢？按如下步骤操作&mdash;&mdash;<br />
以MacOS为例，安装：</p>
<blockquote>
<p>
		brew install git-delta</p>
</blockquote>
<p>全局配置到git，修改 ~/.gitconfig&nbsp;文件，添加如下内容：</p>
<blockquote>
<div>
		[core]</div>
<div>
		&nbsp; &nbsp; pager = delta</div>
<div>
		[interactive]</div>
<div>
		&nbsp; &nbsp; diffFilter = delta --color-only</div>
<div>
		[delta]</div>
<div>
		&nbsp; &nbsp; syntax-theme = Monokai Extended</div>
<div>
		&nbsp; &nbsp; line-numbers = true</div>
<div>
		&nbsp; &nbsp; side-by-side = true</div>
</blockquote>
<div>
	各配置项含义如下：</p>
<div class="document">
<div class="section">
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[core] pager = delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">把 Git 的&ldquo;分页器&rdquo;改成 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响的命令：如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git diff</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">、</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git log -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;"> 等需要分页显示的输出</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">作用：这些命令的输出不再通过 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">less</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">，而是先经过 delta 进行美化后再显示</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[interactive] diffFilter = delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">专门给交互式操作（如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git add -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">）设置 diff 过滤器</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Git 在交互式展示每一块 diff 时，先把原始 diff 丢给 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">-color-only</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">：只加颜色高亮，不改行号、不改文本结构，确保交互命令正常工作</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] syntax-theme = Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">设置 delta 的语法高亮主题为 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响代码内容的配色风格（关键字、字符串、注释等的颜色方案）</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] line-numbers = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">在 delta 输出中展示行号</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">一般会在左侧或边栏显示老/新文件的行号，方便定位</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] side-by-side = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">使用左右并排对比模式显示 diff</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">左侧通常是旧版本，右侧是新版本，效果类似 GitHub PR 的对比视图</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
</p></div>
</p></div>
<p>
	其中，你可以用&nbsp;delta --list-syntax-themes&nbsp;命令查看所有内置主题，并设置到 syntax-theme&nbsp;配置项中。</p>
<p>	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
<div>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 用Whisper.cpp在本地(离线)把mp3音频转成中文</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 17 Sep 2025 14:29:50 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[mp3转中文]]></category>
		<category><![CDATA[OpenAI]]></category>
		<category><![CDATA[whisper.cpp]]></category>
		<category><![CDATA[Whisper模型]]></category>
		<category><![CDATA[语音识别]]></category>
		<category><![CDATA[语音转文字]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14183</guid>

					<description><![CDATA[<p>如果你有把mp3音频转成中文文本、并且不想调用任何云端API的需求，那么本文提供了一个可行的方案。<br />
OS：Ubuntu 20.04 LTS（MacOS类似可用）<br />
<span id="more-14183"></span></p>
<div style="text-align: center;">
	<img decoding="async" alt="audio to text" src="https://www.codelast.com/wp-content/uploads/2025/09/audio_to_text.png" style="width: 600px; height: 528px;" /></div>
<p>
我们首先要知道几个背景知识：<br />
<span style="color:#ff0000;">➤</span> whisper.cpp 是一个开源的 C/C++ 实现，用于运行 OpenAI 的 Whisper 模型。Whisper 是一种先进的自动语音识别（Automatic Speech Recognition, ASR）神经网络模型，能够将音频转换为文本。它支持多种语言，识别效果精准，并且可以完全离线运行，无需互联网连接。这个项目特别适合嵌入到各种应用程序中，因为它是轻量级的实现，可以在 CPU 上高效运行。<br />
<span style="color: rgb(255, 0, 0);">➤</span>&#160;whisper.cpp-cli 是对 whisper.cpp 命令行工具的 Python 封装。<br />
<span style="color:#ff0000;">➤</span>&#160;OpenAI 的 Whisper 模型是一个先进的自动语音识别系统。它是基于 Transformer 架构的神经网络模型，主要用于将音频转换为文本。Whisper 由 OpenAI 开源开发，使用了大规模的多语言数据集进行训练（包括 68 万小时的音频数据，支持 98 种语言），因此具有出色的准确性和鲁棒性，能够处理多种口音、背景噪音和技术术语。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>如果你有把mp3音频转成中文文本、并且不想调用任何云端API的需求，那么本文提供了一个可行的方案。<br />
OS：Ubuntu 20.04 LTS（MacOS类似可用）<br />
<span id="more-14183"></span></p>
<div style="text-align: center;">
	<img decoding="async" alt="audio to text" src="https://www.codelast.com/wp-content/uploads/2025/09/audio_to_text.png" style="width: 600px; height: 528px;" /></div>
<p>
我们首先要知道几个背景知识：<br />
<span style="color:#ff0000;">➤</span> whisper.cpp 是一个开源的 C/C++ 实现，用于运行 OpenAI 的 Whisper 模型。Whisper 是一种先进的自动语音识别（Automatic Speech Recognition, ASR）神经网络模型，能够将音频转换为文本。它支持多种语言，识别效果精准，并且可以完全离线运行，无需互联网连接。这个项目特别适合嵌入到各种应用程序中，因为它是轻量级的实现，可以在 CPU 上高效运行。<br />
<span style="color: rgb(255, 0, 0);">➤</span>&nbsp;whisper.cpp-cli 是对 whisper.cpp 命令行工具的 Python 封装。<br />
<span style="color:#ff0000;">➤</span>&nbsp;OpenAI 的 Whisper 模型是一个先进的自动语音识别系统。它是基于 Transformer 架构的神经网络模型，主要用于将音频转换为文本。Whisper 由 OpenAI 开源开发，使用了大规模的多语言数据集进行训练（包括 68 万小时的音频数据，支持 98 种语言），因此具有出色的准确性和鲁棒性，能够处理多种口音、背景噪音和技术术语。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
因此，我们需要安装的是 <a href="https://github.com/charliermarsh/whisper.cpp-cli" target="_blank">whisper.cpp-cli</a>，并且要下载好 Whisper 模型文件，这样就能使用 whisper.cpp-cli 来调用Whisper模型文件进行语音识别了。<br />
为了不影响系统里安装的软件，我们通常都会用micromamba、uv之类的Python包管理器创建一个新的env（环境），然后在里面再安装其他的Python包。这里我们用uv，你也可以用其他的包管理器来创建env。</p>
<blockquote>
<p>
		mkdir whisper<br />
		cd whisper<br />
		uv venv . --python 3.8&nbsp; # 创建一个新环境<br />
		source&nbsp;bin/activate&nbsp; # 激活环境<br />
		uv pip install pip&nbsp; # 安装pip<br />
		pip install whisper.cpp-cli&nbsp; # 安装whisper.cpp-cli</p>
</blockquote>
<p>
这样我们就安装好了相关软件。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
把 mp3 转成 16 kHz 的 wav 文件：</p>
<blockquote>
<p>
		ffmpeg -y -i /home/codelast/st.mp3 -ar 16000 /home/codelast/st.wav</p>
</blockquote>
<p>注意，16 kHz 是Whisper要求的。</p>
<p>然后我们就可以开始做语音识别了：</p>
<blockquote>
<p>
		&nbsp;whisper-cpp -m ../whisper.cpp/download/x-ggml-model.zh.bin -f /home/codelast/st.wav -l zh --output-txt</p>
</blockquote>
<p>这个命令的参数：<br />
-m：指定使用 x-ggml-model.zh.bin 这个模型来做语音识别（模型文件要提前从 Hugging Face 上下载好）<br />
-f：指定输入文件，对 st.wav 这个音频文件进行语音识别<br />
-l：指定输出语言为中文<br />
--output-txt：直接输出txt文本。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
输出内容示例如下：</p>
<blockquote>
<div>
		whisper_init_from_file_with_params_no_state: loading model from &#39;../whisper.cpp/download/x-ggml-model.zh.bin&#39;</div>
<div>
		whisper_model_load: loading model</div>
<div>
		whisper_model_load: n_vocab&nbsp; &nbsp; &nbsp; &nbsp;= 51865</div>
<div>
		whisper_model_load: n_audio_ctx&nbsp; &nbsp;= 1500</div>
<div>
		whisper_model_load: n_audio_state = 768</div>
<div>
		whisper_model_load: n_audio_head&nbsp; = 12</div>
<div>
		whisper_model_load: n_audio_layer = 12</div>
<div>
		whisper_model_load: n_text_ctx&nbsp; &nbsp; = 448</div>
<div>
		whisper_model_load: n_text_state&nbsp; = 768</div>
<div>
		whisper_model_load: n_text_head&nbsp; &nbsp;= 12</div>
<div>
		whisper_model_load: n_text_layer&nbsp; = 12</div>
<div>
		whisper_model_load: n_mels&nbsp; &nbsp; &nbsp; &nbsp; = 80</div>
<div>
		whisper_model_load: ftype&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 1</div>
<div>
		whisper_model_load: qntvr&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 0</div>
<div>
		whisper_model_load: type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 3 (small)</div>
<div>
		whisper_model_load: adding 1608 extra tokens</div>
<div>
		whisper_model_load: n_langs&nbsp; &nbsp; &nbsp; &nbsp;= 99</div>
<div>
		whisper_model_load:&nbsp; &nbsp; &nbsp; CPU total size =&nbsp; &nbsp;487.01 MB</div>
<div>
		whisper_model_load: model size&nbsp; &nbsp; =&nbsp; 487.01 MB</div>
<div>
		whisper_init_state: kv self size&nbsp; =&nbsp; &nbsp;49.55 MB</div>
<div>
		whisper_init_state: kv cross size =&nbsp; &nbsp;55.30 MB</div>
<div>
		whisper_init_state: compute buffer (conv)&nbsp; &nbsp;=&nbsp; &nbsp;22.54 MB</div>
<div>
		&nbsp;whisper_init_state: compute buffer (encode) =&nbsp; 2&nbsp;80.20 MB</div>
<div>
		whisper_init_state: compute buffer (cross)&nbsp; =&nbsp; &nbsp; 6.31 MB</div>
<div>
		whisper_init_state: compute buffer (decode) =&nbsp; &nbsp;97.40 MB</div>
<div>
		&nbsp;</div>
<div>
		system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0</div>
<div>
		&nbsp;</div>
<div>
		main: processing &#39;/home/codelast/st.wav&#39; (23957094 samples, 1497.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = zh, task = transcribe, timestamps = 1 ...</div>
<div>
		&nbsp;</div>
<div>
		&nbsp;</div>
<div>
		[00:00:00.000 --&gt; 00:00:19.040]&nbsp; 这里是识别文字第一句</div>
<div>
		[00:00:19.040 --&gt; 00:00:40.280]&nbsp; 这里是识别文字第二句<br />
		......</div>
</blockquote>
<div>
	<br />
	可以看到，输出的内容是带有时间轴标识的。<br />
	细看会发现，Whisper输出的中文有时会有错别字，我们可以用AI进一步纠错，想怎么处理你就可以随意发挥了。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
<div>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 如何判断已经启动的TF-Serving服务是否正在使用</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 16 Sep 2024 04:27:03 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[TF-Serving]]></category>
		<category><![CDATA[TFServing]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14136</guid>

					<description><![CDATA[<p>在一台服务器上，如果启动了一个TF-Serving服务，我们知道它占了资源，却不知道它是在空跑还是<span style="color:#ff0000;">真的在用</span>。<br />
本文描述了怎样判断它是否真的在用。<br />
<span id="more-14136"></span></p>
<div>
	用 nvidia-smi 命令能看到 TF-Serving 服务在运行：</div>
<p><img decoding="async" alt="TF-Serving is running" src="https://www.codelast.com/wp-content/uploads/2024/09/tf_serving_running.png" style="width: 700px; height: 149px;" /></p>
<div>
<div>
		其进程id是 22871，于是进一步查询这个进程的信息：</div>
<blockquote>
<div>
			ps -ef &#124; grep 22871</div>
</blockquote>
<div>
		输出类似于：</div>
<blockquote>
<div>
			root&#160; &#160; &#160;22871 22729 83 13:42 pts/0&#160; &#160; 00:06:35 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=codelast --model_base_path=/models/codelast</div>
</blockquote>
<div>
		可见其REST服务的端口号为 8501。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>
<div>
			于是我们可以用 tcpdump 来捕获并分析流量，运行以下命令(需要 root 权限)：：</div>
<blockquote>
<div>
				sudo tcpdump -vv -i any &#39;port 8501&#39;</div>
</blockquote>
<div>
			如果有客户端正在向这个TF-Serving服务发送请求，我们应会看到这个命令有输出，不断在刷屏，类似于：
<div>
				<span style="color:#0000ff;">14:27:59.174425 IP (tos 0x0, ttl 60, id 51707, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.60679</span></div></div></div></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>在一台服务器上，如果启动了一个TF-Serving服务，我们知道它占了资源，却不知道它是在空跑还是<span style="color:#ff0000;">真的在用</span>。<br />
本文描述了怎样判断它是否真的在用。<br />
<span id="more-14136"></span></p>
<div>
	用 nvidia-smi 命令能看到 TF-Serving 服务在运行：</div>
<p><img decoding="async" alt="TF-Serving is running" src="https://www.codelast.com/wp-content/uploads/2024/09/tf_serving_running.png" style="width: 700px; height: 149px;" /></p>
<div>
<div>
		其进程id是 22871，于是进一步查询这个进程的信息：</div>
<blockquote>
<div>
			ps -ef | grep 22871</div>
</blockquote>
<div>
		输出类似于：</div>
<blockquote>
<div>
			root&nbsp; &nbsp; &nbsp;22871 22729 83 13:42 pts/0&nbsp; &nbsp; 00:06:35 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=codelast --model_base_path=/models/codelast</div>
</blockquote>
<div>
		可见其REST服务的端口号为 8501。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<div>
			于是我们可以用 tcpdump 来捕获并分析流量，运行以下命令(需要 root 权限)：：</div>
<blockquote>
<div>
				sudo tcpdump -vv -i any &#39;port 8501&#39;</div>
</blockquote>
<div>
			如果有客户端正在向这个TF-Serving服务发送请求，我们应会看到这个命令有输出，不断在刷屏，类似于：</p>
<div>
				<span style="color:#0000ff;">14:27:59.174425 IP (tos 0x0, ttl 60, id 51707, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.60679 &gt; 172.17.0.2.cmtp-mgt: Flags [.], cksum 0x310f (correct), seq 617580:619040, ack 1, win 63, length 1460</span></div>
<div>
				<span style="color:#0000ff;">14:27:59.174453 IP (tos 0x0, ttl 60, id 39347, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.32739 &gt; 172.17.0.2.cmtp-mgt: Flags [.], cksum 0x9354 (correct), seq 44268904:44270364, ack 1, win 86, length 1460</span></div>
<p>			如果没有请求发到TF-Serving服务，那么上面的命令什么都不会输出，就表明TF-Serving服务没在用。<br />
			<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
			<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
			转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
			感谢关注我的微信公众号（微信扫一扫）：<br />
			<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
			以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
				<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</p></div>
<p>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] JAVA map-reduce job的counter页面无法显示的问题(error 500)</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Tue, 30 Apr 2024 09:11:34 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[counter]]></category>
		<category><![CDATA[error 500]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[map-reduce]]></category>
		<category><![CDATA[RFC 2616]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14101</guid>

					<description><![CDATA[<p>这个问题已经不是第一次遇到了，只不过今天痛下决心花费不少时间把它写成文章，给遇到同样问题的朋友一些参考。<br />
我们知道，一个JAVA M-R job跑完后，无论是在命令行，还是在job的信息展示网页上，都会看到输出job counter的信息。在网页上，通过点击job信息页中的&#34;counter&#34;链接就能看到。<br />
<span id="more-14101"></span><br />
<img decoding="async" alt="hadoop job info page" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_job_info_page.png" style="width: 339px; height: 361px;" /></p>
<p>本来嘛，点进这个页面，会看到正常的counter数据，但是，出问题的情况下，点进去看到的是这种情况：<br />
<img decoding="async" alt="hadoop counter info error" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_counter_error.png" style="width: 561px; height: 144px;" /><br />
同时，在shell命令行你也会发现，该job没有任何counter信息输出。<br />
从错误信息页上，你得不到关于错误的任何有效信息，那个&#34;Error Details&#34;里也没有。<br />
虽然counter无法显示，但M-R job是可以正常跑完、正常输出数据的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
经过测试，我这个case的的问题是：在程序中添加了数量比较多的Hadoop counter造成的。<br />
什么算多？我不知道。我的程序里有240多个counter就出问题了，当我把counter缩减了一半，最终只有120多个counter的时候，counter信息统计就正常了。<br />
如果你遇到了类似问题，可以首先检查一下job中的counter数量是否太多。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#160;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" />&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>这个问题已经不是第一次遇到了，只不过今天痛下决心花费不少时间把它写成文章，给遇到同样问题的朋友一些参考。<br />
我们知道，一个JAVA M-R job跑完后，无论是在命令行，还是在job的信息展示网页上，都会看到输出job counter的信息。在网页上，通过点击job信息页中的&quot;counter&quot;链接就能看到。<br />
<span id="more-14101"></span><br />
<img decoding="async" alt="hadoop job info page" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_job_info_page.png" style="width: 339px; height: 361px;" /></p>
<p>本来嘛，点进这个页面，会看到正常的counter数据，但是，出问题的情况下，点进去看到的是这种情况：<br />
<img decoding="async" alt="hadoop counter info error" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_counter_error.png" style="width: 561px; height: 144px;" /><br />
同时，在shell命令行你也会发现，该job没有任何counter信息输出。<br />
从错误信息页上，你得不到关于错误的任何有效信息，那个&quot;Error Details&quot;里也没有。<br />
虽然counter无法显示，但M-R job是可以正常跑完、正常输出数据的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
经过测试，我这个case的的问题是：在程序中添加了数量比较多的Hadoop counter造成的。<br />
什么算多？我不知道。我的程序里有240多个counter就出问题了，当我把counter缩减了一半，最终只有120多个counter的时候，counter信息统计就正常了。<br />
如果你遇到了类似问题，可以首先检查一下job中的counter数量是否太多。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 解决IntelliJ idea启动崩溃：error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=...</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Fri, 15 Mar 2024 09:48:13 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[IntelliJ IDEA]]></category>
		<category><![CDATA[SIGABRT]]></category>
		<category><![CDATA[启动]]></category>
		<category><![CDATA[崩溃]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14081</guid>

					<description><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="" src="https://www.codelast.com/wp-content/uploads/2024/03/intellij_idea_logo.jpeg" style="width: 225px; height: 225px;" /></div>
<p>有时候，一个用了好久、一直好用的方法突然失灵，并且还折腾了几天，真的会逼疯人。<br />
前几天我就遇到了这种破事：在Ubuntu开发机上自己升级IntelliJ idea到最新版之后，就无法再启动它。<br />
启动时永远会崩溃，无论是重启系统、删除IntelliJ idea的本地缓存，或者使用回旧版，都无法再启动它（仿佛什么文件被&#34;污染&#34;了，再也回不去了），十分烦人。经过几天各种方法的尝试，终于解决了问题，我的解决办法不具有普适性，但如果你遇到了此类问题，或许可以为你提供一些解决思路。<br />
<span id="more-14081"></span><br />
OS：<span style="color:#0000ff;">Ubuntu 20.04.6 LTS</span><br />
JDK：<span style="color:#0000ff;">1.8.0_382</span><br />
原来安装的IntelliJ idea版本：<span style="color:#b22222;">idea-IC-232.8660.185</span><br />
从JetBrains官网上下载的新版IntelliJ idea版本：<span style="color:#b22222;">idea-IC-233.14808.21</span><br />
我当时不是利用IDE里的升级功能来升级的，而是自己下载了新版的压缩包，解压出来一个&#160;idea-IC-233.14808.21 目录，直接进入 bin 目录下执行 idea.sh 来跑的新版。众所周知，这样跑起来之后，新版会自动把旧版里的配置引入进来，只要没有问题，是可以无缝切换到新版不需要重新配置的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
然而，启动新版的时候挂了，最后看到命令行报错：</p>
<blockquote>
<div>
		[error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=0x00007fed3c5cf00b]</div>
<div>
		Aborted (core dumped)</div>
</blockquote>
<div>
	进不了IDE主界面。同时在/home目录下生成了一个内容超长的错误报告文件 java_error_in_idea_xxx.log</div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="" src="https://www.codelast.com/wp-content/uploads/2024/03/intellij_idea_logo.jpeg" style="width: 225px; height: 225px;" /></div>
<p>有时候，一个用了好久、一直好用的方法突然失灵，并且还折腾了几天，真的会逼疯人。<br />
前几天我就遇到了这种破事：在Ubuntu开发机上自己升级IntelliJ idea到最新版之后，就无法再启动它。<br />
启动时永远会崩溃，无论是重启系统、删除IntelliJ idea的本地缓存，或者使用回旧版，都无法再启动它（仿佛什么文件被&quot;污染&quot;了，再也回不去了），十分烦人。经过几天各种方法的尝试，终于解决了问题，我的解决办法不具有普适性，但如果你遇到了此类问题，或许可以为你提供一些解决思路。<br />
<span id="more-14081"></span><br />
OS：<span style="color:#0000ff;">Ubuntu 20.04.6 LTS</span><br />
JDK：<span style="color:#0000ff;">1.8.0_382</span><br />
原来安装的IntelliJ idea版本：<span style="color:#b22222;">idea-IC-232.8660.185</span><br />
从JetBrains官网上下载的新版IntelliJ idea版本：<span style="color:#b22222;">idea-IC-233.14808.21</span><br />
我当时不是利用IDE里的升级功能来升级的，而是自己下载了新版的压缩包，解压出来一个&nbsp;idea-IC-233.14808.21 目录，直接进入 bin 目录下执行 idea.sh 来跑的新版。众所周知，这样跑起来之后，新版会自动把旧版里的配置引入进来，只要没有问题，是可以无缝切换到新版不需要重新配置的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
然而，启动新版的时候挂了，最后看到命令行报错：</p>
<blockquote>
<div>
		[error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=0x00007fed3c5cf00b]</div>
<div>
		Aborted (core dumped)</div>
</blockquote>
<div>
	进不了IDE主界面。同时在/home目录下生成了一个内容超长的错误报告文件 java_error_in_idea_xxx.log<br />
	我一开始并没有看这个log文件，而是按网上搜到的方法，分别尝试了：<br />
	1、重启系统<br />
	2、删除IntelliJ idea的缓存<br />
	3、使用回旧版IntelliJ idea<br />
	4、仿照<a href="https://youtrack.jetbrains.com/issue/IDEA-315192/IntelliJ-would-not-open-after-being-closed-once-on-Ubuntu-22.04-LTS.-The-only-solution-is-rebooting." rel="noopener" target="_blank">这个</a>类似的问题，卸载了snap又重新安装<br />
	以上方法都没用。<br />
	实在没辙了，只能硬着头皮看崩溃产生的日志文件&nbsp;java_error_in_idea_xxx.log，没想到一看就发现了端倪。<br />
	开头有一段内容是：</p>
<div>
<blockquote>
<div>
				# Problematic frame:</div>
<div>
				# C&nbsp; [x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09]&nbsp; tree_sitter_cpp_external_scanner_deserialize+0x179</div>
</blockquote>
<div>
<div>
				虽然我不知道它是什么确切的意思，但是这里写的是&quot;问题帧&quot;，说明崩溃和它有关。<br />
				<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
				再往下看日志，会看到：</p>
<blockquote>
<div>
						---------------&nbsp; T H R E A D&nbsp; ---------------</div>
<div>
						&nbsp;</div>
<div>
						Current thread (0x00007fec7c02b370):&nbsp; JavaThread &quot;AWT-EventQueue-0&quot; [_thread_in_native, id=352672, stack(0x00007feb95ae5000,0x00007feb95be6000)]</div>
<div>
						&nbsp;</div>
<div>
						Stack: [0x00007feb95ae5000,0x00007feb95be6000],&nbsp; sp=0x00007feb95be0310,&nbsp; free space=1004k</div>
<div>
						Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09]&nbsp; tree_sitter_cpp_external_scanner_deserialize+0x179</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0x30b3e]&nbsp; ts_parser_reset+0x30e</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0x2e329]&nbsp; ts_parser_set_language+0x399</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0xb4875]&nbsp; Java_org_treesitter_TSParser_ts_1parser_1set_1language+0x25</div>
<div>
						j&nbsp; org.treesitter.TSParser.ts_parser_set_language(JJ)Z+0</div>
<div>
						j&nbsp; org.treesitter.TSParser.setLanguage(Lorg/treesitter/TSLanguage;)Z+10</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.a()V+64</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.&lt;init&gt;()V+365</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.a()Lai/codegeex/plugin/lang/agent/d;+10</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.CodegeexAgentCompletionService.e()V+0</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.CodegeexAgentCompletionService.&lt;init&gt;()V+266</div>
</blockquote>
<div>
<div>
						可见，和出问题的&quot;x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09&quot;有关系的插件，第一个出现的就是&quot;ai.codegeex.plugin&quot;，这个对应的就是我安装的CodeGeeX插件。</div>
<div>
						所以我怀疑，删除这个插件可以解决IntelliJ idea启动崩溃的问题。</div>
<div>
						在Ubuntu系统上，插件安装在这个目录下：<span style="color:#0000ff;">~/.local/share/JetBrains/IdeaIC2023.3</span></div>
<div>
						其中，IdeaIC2023.3是IntelliJ idea的版本号，每升级一个版本，~/.local/share/JetBrains 目录下都会生成一个新的目录。</div>
<div>
						在这个目录下，会看到有一个名为&quot;CodeGeeX&quot;的目录，这个目录就是CodeGeeX插件的安装目录，删除它即可。</div>
<div>
						然后再次尝试启动IntelliJ idea，发现已经可以正常启动了。<br />
						虽然我现在还不知道为什么CodeGeeX插件会引起这个问题，但是如果你像我一样，实在找不到IDE崩溃的原因时，删除可能有问题的插件或许是解决问题的一个办法。<br />
						<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
						<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
						转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
						感谢关注我的微信公众号（微信扫一扫）：<br />
						<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
						以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
							<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</p></div>
</p></div>
</p></div>
</p></div>
</p></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 和付费使用一年多的GitHub Copilot说再见</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/#comments</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Fri, 01 Mar 2024 19:16:53 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[AI辅助编程]]></category>
		<category><![CDATA[CodeGeeX]]></category>
		<category><![CDATA[GitHub Copilot]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14064</guid>

					<description><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_1.png" style="width: 800px; height: 213px;" /></div>
<div>
	&#160;</div>
<div>
	昨天，我的GitHub Copilot订阅到期了。付费使用了一年多，现在也决定不再续费，颇有些感受。<br />
	&#160;</div>
<div>
	从付费之前的热切期盼，到使用过程中的逐渐习惯，再到付费结束时的&#34;从容分手&#34;，我终究还是向现实投降，选择了穷人的活法。<br />
	&#160;</div>
<div>
	毕竟一个月10美元的费用，说它值或不值都可以找出充分的理由，只不过于我而言，GitHub Copilot已经不再有$10/月的吸引力罢了。<br />
	<span id="more-14064"></span></div>
<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_2.png" style="width: 800px; height: 309px;" /></div>
<div>
<!--more--></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot作为世界上第一款效果一流的AI辅助编程工具，是在2022年9月份正式上线的。之前，我和很多同行一样，时不时困在&#34;写代码&#8594;某些片段忘了怎么写&#8594;搜Google&#8594;复制粘贴网上的代码测试&#8594;继续写代码&#34;的循环中。这种熟悉而又重复的感觉长时间下来会给人积累不少负能量。<br />
	&#160;</div>
<div>
	直到GitHub Copilot出现，在科技媒体的渲染、宣传下，以及民间艺人的自测报告加持下，它被赋予了一个响当当的名字：牛B！<br />
	&#160;</div>
<div>
	于是我心动了。在试用了一个月，又继续付费体验了一个月之后，GitHub Copilot给我的震动让我相信：它一定能在开发过程中为我节省海量时间。于是在2023年初，我下定决心要续一年的费。<br />
	&#160;</div>
<div>
	$10/月的费用，对很多开发者来说可能要下很大决心才能下手。当时我账户上有一个优惠，以90多美元的价格续了一年的费，也就是不到700人民币一年。<br />
	&#160;</div>
<div>
	GitHub Copilot代码补全的准确度令人印象深刻。我觉得最爽的一点就是：它补全中文注释的结果令我十分满意。无论是补全class头部的比较长的注释，还是在写代码的过程中，补全一行的那种注释，我都觉得它能&#34;想我所想，写我想写&#34;。<br />
	&#160;</div>
<div>
	当然也有最不爽的一点，就是它连接服务器时不时会卡顿&#8212;&#8212;服务器在国外，可以理解。<br />
	&#160;</div>
<div style="text-align: center;">
	<img decoding="async" alt="alternatives" src="https://www.codelast.com/wp-content/uploads/2024/03/alternative.jpg" style="width: 750px; height: 320px;" /></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	在2023年初那时，如果你想找到一个GitHub Copilot的免费版&#34;平替&#34;，那么选择并不多。国内的aiXcoder、CodeGeeX算是比较知名的其二。</div>
<div style="text-align: center;">
	<img decoding="async" alt="aiXcoder" src="https://www.codelast.com/wp-content/uploads/2024/03/aixcoder.jpg" style="width: 360px; height: 147px;" /></div>
<div>
	aiXcoder的最初几个版本我一直觉得它的设计是真的&#34;有病&#34;&#8212;&#8212;需要在本地安装一个后端软件来做inference。由于和系统相关，这显然会导致在很多Linux发行版上会用不了。比如我曾经在Ubuntu 16.04上尝试过安装aiXcoder的本地推理软件，无奈由于依赖库的问题装不上，到官方QQ群里反馈问题，官方的开发也只是确认了问题，却不给任何解决方案。想必这种没有任何KPI又赚不了一毛钱的事情，鬼才会给你开发！<br />
	&#160;</div>
<div>
	所以我毫不犹豫地抛弃了aiXcoder。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot那种&#34;一个插件解决所有问题&#34;、&#34;推理在云端完成&#34;的机制，就基本避开了系统版本的差异，没有了依赖库的问题。<br />
	&#160;</div>
<div style="text-align: center;">
	<img decoding="async" alt="CodeGeeX" src="https://www.codelast.com/wp-content/uploads/2024/03/codegeex.png" style="width: 727px; height: 153px;" /></div>
<div>
	<br />
	在2023年初，国产的CodeGeeX算是辅助编程领域的另一个耀眼明星。它和GitHub Copilot一样，inference都在云端完成，安装一个插件搞定一切&#8212;&#8212;这才像是一个正常的辅助编程软件应有的样子。<br />
	&#160;</div>
<div>
	我当时在大概10个case上详细对比过GitHub Copilot和CodeGeeX的效果，结论当然不出意外：GitHub Copilot全面碾压CodeGeeX&#8212;&#8212;这里不是指在某些技术指标上进行对比(比如用于评估代码生成质量的测试集等)，而是纯粹从个人的直观感受上看二者的输出谁更好。<br />
	&#160;</div>
<div>
	所以，实话实说，在2023年初的时候，经过我个人的测试，我宁可选择花700块这么大一笔费用去买GitHub Copilot，也不愿每天频繁使用免费的CodeGeeX，因为它当时的代码补全效果确实不太行，而且对某些编程语言的支持也很菜（比如Apache Pig），会影响我的开发工作。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_1.png" style="width: 800px; height: 213px;" /></div>
<div>
	&nbsp;</div>
<div>
	昨天，我的GitHub Copilot订阅到期了。付费使用了一年多，现在也决定不再续费，颇有些感受。<br />
	&nbsp;</div>
<div>
	从付费之前的热切期盼，到使用过程中的逐渐习惯，再到付费结束时的&quot;从容分手&quot;，我终究还是向现实投降，选择了穷人的活法。<br />
	&nbsp;</div>
<div>
	毕竟一个月10美元的费用，说它值或不值都可以找出充分的理由，只不过于我而言，GitHub Copilot已经不再有$10/月的吸引力罢了。<br />
	<span id="more-14064"></span></div>
<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_2.png" style="width: 800px; height: 309px;" /></div>
<div>
<!--more--></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot作为世界上第一款效果一流的AI辅助编程工具，是在2022年9月份正式上线的。之前，我和很多同行一样，时不时困在&quot;写代码&rarr;某些片段忘了怎么写&rarr;搜Google&rarr;复制粘贴网上的代码测试&rarr;继续写代码&quot;的循环中。这种熟悉而又重复的感觉长时间下来会给人积累不少负能量。<br />
	&nbsp;</div>
<div>
	直到GitHub Copilot出现，在科技媒体的渲染、宣传下，以及民间艺人的自测报告加持下，它被赋予了一个响当当的名字：牛B！<br />
	&nbsp;</div>
<div>
	于是我心动了。在试用了一个月，又继续付费体验了一个月之后，GitHub Copilot给我的震动让我相信：它一定能在开发过程中为我节省海量时间。于是在2023年初，我下定决心要续一年的费。<br />
	&nbsp;</div>
<div>
	$10/月的费用，对很多开发者来说可能要下很大决心才能下手。当时我账户上有一个优惠，以90多美元的价格续了一年的费，也就是不到700人民币一年。<br />
	&nbsp;</div>
<div>
	GitHub Copilot代码补全的准确度令人印象深刻。我觉得最爽的一点就是：它补全中文注释的结果令我十分满意。无论是补全class头部的比较长的注释，还是在写代码的过程中，补全一行的那种注释，我都觉得它能&quot;想我所想，写我想写&quot;。<br />
	&nbsp;</div>
<div>
	当然也有最不爽的一点，就是它连接服务器时不时会卡顿&mdash;&mdash;服务器在国外，可以理解。<br />
	&nbsp;</div>
<div style="text-align: center;">
	<img decoding="async" alt="alternatives" src="https://www.codelast.com/wp-content/uploads/2024/03/alternative.jpg" style="width: 750px; height: 320px;" /></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	在2023年初那时，如果你想找到一个GitHub Copilot的免费版&quot;平替&quot;，那么选择并不多。国内的aiXcoder、CodeGeeX算是比较知名的其二。</div>
<div style="text-align: center;">
	<img decoding="async" alt="aiXcoder" src="https://www.codelast.com/wp-content/uploads/2024/03/aixcoder.jpg" style="width: 360px; height: 147px;" /></div>
<div>
	aiXcoder的最初几个版本我一直觉得它的设计是真的&quot;有病&quot;&mdash;&mdash;需要在本地安装一个后端软件来做inference。由于和系统相关，这显然会导致在很多Linux发行版上会用不了。比如我曾经在Ubuntu 16.04上尝试过安装aiXcoder的本地推理软件，无奈由于依赖库的问题装不上，到官方QQ群里反馈问题，官方的开发也只是确认了问题，却不给任何解决方案。想必这种没有任何KPI又赚不了一毛钱的事情，鬼才会给你开发！<br />
	&nbsp;</div>
<div>
	所以我毫不犹豫地抛弃了aiXcoder。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot那种&quot;一个插件解决所有问题&quot;、&quot;推理在云端完成&quot;的机制，就基本避开了系统版本的差异，没有了依赖库的问题。<br />
	&nbsp;</div>
<div style="text-align: center;">
	<img decoding="async" alt="CodeGeeX" src="https://www.codelast.com/wp-content/uploads/2024/03/codegeex.png" style="width: 727px; height: 153px;" /></div>
<div>
	<br />
	在2023年初，国产的CodeGeeX算是辅助编程领域的另一个耀眼明星。它和GitHub Copilot一样，inference都在云端完成，安装一个插件搞定一切&mdash;&mdash;这才像是一个正常的辅助编程软件应有的样子。<br />
	&nbsp;</div>
<div>
	我当时在大概10个case上详细对比过GitHub Copilot和CodeGeeX的效果，结论当然不出意外：GitHub Copilot全面碾压CodeGeeX&mdash;&mdash;这里不是指在某些技术指标上进行对比(比如用于评估代码生成质量的测试集等)，而是纯粹从个人的直观感受上看二者的输出谁更好。<br />
	&nbsp;</div>
<div>
	所以，实话实说，在2023年初的时候，经过我个人的测试，我宁可选择花700块这么大一笔费用去买GitHub Copilot，也不愿每天频繁使用免费的CodeGeeX，因为它当时的代码补全效果确实不太行，而且对某些编程语言的支持也很菜（比如Apache Pig），会影响我的开发工作。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	然而，在2023年一整年，CodeGeeX做了令人印象深刻的重大改进，不仅代码补全的质量提高很多，而且还增加了大量新功能，三言两语无法说完。因此，在2024年的今天，当我的GitHub Copilot需要再次付费的时候，除非它降价到原来的10%（我当然知道这不可能），否则我是不会再续费了，我会选择用免费的国产平替：CodeGeeX。<br />
	&nbsp;</div>
<div>
	2023年至今，除了CodeGeeX的巨大进步之外，市场上还出现了大量免费竞品，包括Codeium（国外），Fitten Code（国产）等等，它们虽然可能比GitHub Copilot还有差距，但是你要相信：只要你不是特别挑剔，日常使用绝对够了。</div>
<div>
	&nbsp;</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 在Apache Pig中把数据按指定字段分组，每组取时间最新的一条记录</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 15 Nov 2023 08:15:25 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[GROUP]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13967</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig处理大数据时，经常会有这种需求：把输入数据按指定的字段group，并且每个group内只输出时间最新的一条记录。<br />
<span id="more-13967"></span><br />
举个例子。有数据文件 input.txt ：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">10&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2,3
9&#160;&#160;&#160;&#160;&#160;&#160;&#160;b&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2
8&#160;&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;2,3,4
13&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2,3,4
6&#160;&#160;&#160;&#160;&#160;&#160;&#160;b&#160;&#160;&#160;&#160;&#160;&#160;&#160;1
</code></pre>
</section>
<p>该数据的三个字段分别代表：<span style="background-color: rgb(255, 255, 255); color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre;">time（时间戳），userId（用户id），userInterest（用户兴趣id）<br />
现在，要找出每个用户时间最新的</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">应该怎么做？</span><br />
<span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">即：对用户 a，最新的时间戳是13，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2,3,4；对用户 b，最新的时间戳是9，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2。</span><br />
<span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">直接上代码：</span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;input.txt&#39;</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&#160;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">time</span>:&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">long</span>,&#160;userId:&#160;chararray,&#160;userInterest:&#160;chararray);
A&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;time,&#160;userId,&#160;userInterest;
B&#160;=&#160;GROUP&#160;A&#160;BY&#160;userId;
<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">--&#160;每个userId取时间最新的一条记录</span>
C&#160;=&#160;FOREACH&#160;B&#160;{
&#160;&#160;&#160;&#160;SORTED&#160;=&#160;ORDER&#160;A&#160;BY&#160;time&#160;DESC;
&#160;&#160;&#160;&#160;ONE_RECORD&#160;=&#160;LIMIT&#160;SORTED&#160;1;
&#160;&#160;&#160;&#160;GENERATE&#160;FLATTEN(ONE_RECORD);
};
DUMP&#160;C;
</code></pre>
</section>
<p>
在嵌套的FOREACH语句中，首先用ORDER BY对同一个group内的数据进行了降序排序，再用LIMIT取一条记录，由于是按time降序排序，因此LIMIT 1取到的就是时间戳最大的那条记录，即时间最新的记录。<br />
<span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">输出：</span></p>
<blockquote>
<div>
		(13,a,1,2,3,4)</div>
<div>
		(9,b,1,2)</div>
</blockquote>
<p><span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">可见这个结果和我们前面人工判断出来的正确结果一致。</span></p>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig处理大数据时，经常会有这种需求：把输入数据按指定的字段group，并且每个group内只输出时间最新的一条记录。<br />
<span id="more-13967"></span><br />
举个例子。有数据文件 input.txt ：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2,3
9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2
8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2,3,4
13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2,3,4
6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1
</code></pre>
</section>
<p>该数据的三个字段分别代表：<span style="background-color: rgb(255, 255, 255); color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre;">time（时间戳），userId（用户id），userInterest（用户兴趣id）<br />
现在，要找出每个用户时间最新的</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">应该怎么做？</span><br />
<span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">即：对用户 a，最新的时间戳是13，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2,3,4；对用户 b，最新的时间戳是9，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2。</span><br />
<span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">直接上代码：</span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;input.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">time</span>:&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">long</span>,&nbsp;userId:&nbsp;chararray,&nbsp;userInterest:&nbsp;chararray);
A&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;time,&nbsp;userId,&nbsp;userInterest;
B&nbsp;=&nbsp;GROUP&nbsp;A&nbsp;BY&nbsp;userId;
<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">--&nbsp;每个userId取时间最新的一条记录</span>
C&nbsp;=&nbsp;FOREACH&nbsp;B&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;SORTED&nbsp;=&nbsp;ORDER&nbsp;A&nbsp;BY&nbsp;time&nbsp;DESC;
&nbsp;&nbsp;&nbsp;&nbsp;ONE_RECORD&nbsp;=&nbsp;LIMIT&nbsp;SORTED&nbsp;1;
&nbsp;&nbsp;&nbsp;&nbsp;GENERATE&nbsp;FLATTEN(ONE_RECORD);
};
DUMP&nbsp;C;
</code></pre>
</section>
<p>
在嵌套的FOREACH语句中，首先用ORDER BY对同一个group内的数据进行了降序排序，再用LIMIT取一条记录，由于是按time降序排序，因此LIMIT 1取到的就是时间戳最大的那条记录，即时间最新的记录。<br />
<span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">输出：</span></p>
<blockquote>
<div>
		(13,a,1,2,3,4)</div>
<div>
		(9,b,1,2)</div>
</blockquote>
<p><span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">可见这个结果和我们前面人工判断出来的正确结果一致。</span></p>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 在Apache Pig中把时间字符串转换成时间戳</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Thu, 12 Oct 2023 09:37:25 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[时间字符串]]></category>
		<category><![CDATA[时间戳]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13959</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" target="_blank" rel="noopener"><span style="background-color:#ffa07a;">这里</span></a>。</p>
<p>在Apache Pig中，怎样把 <span style="color:#ff0000;">2023-10-11_10:57:56</span> 这种格式的时间字符串，转成整型的时间戳？<br />
话不多说，直接上代码。<br />
假设输入数据文件 1.txt，其格式是一行一个时间字符串。<br />
<span id="more-13959"></span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&#160;(dt:&#160;chararray);
A&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;ToDate(dt,&#160;&#39;yyyy-MM-dd_HH:mm:ss&#39;)&#160;AS&#160;date;
B&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;ToUnixTime(date)&#160;AS&#160;ts;
DUMP&#160;B;
</code></pre>
</section>
<p>
输出结果形如：</p>
<blockquote>
<p>
		1696993076</p>
</blockquote>
<p>
可见，这样得到的时间戳单位是&#8220;秒&#8221;。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#160;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" />&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" target="_blank" rel="noopener"><span style="background-color:#ffa07a;">这里</span></a>。</p>
<p>在Apache Pig中，怎样把 <span style="color:#ff0000;">2023-10-11_10:57:56</span> 这种格式的时间字符串，转成整型的时间戳？<br />
话不多说，直接上代码。<br />
假设输入数据文件 1.txt，其格式是一行一个时间字符串。<br />
<span id="more-13959"></span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(dt:&nbsp;chararray);
A&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;ToDate(dt,&nbsp;&#39;yyyy-MM-dd_HH:mm:ss&#39;)&nbsp;AS&nbsp;date;
B&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;ToUnixTime(date)&nbsp;AS&nbsp;ts;
DUMP&nbsp;B;
</code></pre>
</section>
<p>
输出结果形如：</p>
<blockquote>
<p>
		1696993076</p>
</blockquote>
<p>
可见，这样得到的时间戳单位是&ldquo;秒&rdquo;。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] JAVA map-reduce job中，reduce()方法漏写 @Override 注解引起的问题</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sun, 06 Aug 2023 12:12:10 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[map-reduce job]]></category>
		<category><![CDATA[类型错误]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13934</guid>

					<description><![CDATA[<p>有一个JAVA写的map-reduce&#160;job，mapper输出的key、value类型分别为Text、NullWritable，所以reducer应该像下面这样写：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">static</span>&#160;<span class="hljs-class" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">class</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">QuerySegmentResultFromKVReducer</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">extends</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Reducer</span>&#60;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Text</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>&#62;&#160;</span>{

&#160;&#160;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&#160;&#160;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">setup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&#160;context)</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&#160;IOException,&#160;InterruptedException&#160;</span>{
&#160;&#160;}

&#160;&#160;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&#160;&#160;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">cleanup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&#160;context)</span></span></code></pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/" class="read-more">Read More </a></section>]]></description>
										<content:encoded><![CDATA[<p>有一个JAVA写的map-reduce&nbsp;job，mapper输出的key、value类型分别为Text、NullWritable，所以reducer应该像下面这样写：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">static</span>&nbsp;<span class="hljs-class" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">class</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">QuerySegmentResultFromKVReducer</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">extends</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Reducer</span>&lt;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Text</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>&gt;&nbsp;</span>{

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">setup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;}

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">cleanup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;}

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); overflow-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">reduce</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Text&nbsp;key,&nbsp;Iterable&lt;NullWritable&gt;&nbsp;values,&nbsp;Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//<span class="hljs-doctag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">TODO:</span></span>
&nbsp;&nbsp;}
}
</code></pre>
</section>
<p>在这里，reducer输出的key、value类型都是NullWritable，我们不用关心，这不是本文的关注点。<br />
<span id="more-13934"></span><br />
如果reduce()方法漏掉了&nbsp;<span style="color:#ff0000;">@Override</span>&nbsp;注解，并且把&nbsp; Reducer&lt;Text, NullWritable, NullWritable, NullWritable&gt;&nbsp;错误地写成了&nbsp;Reducer&lt;Text, Text, NullWritable, NullWritable&gt;，会发现编译并不报错。<br />
但是，当你跑这个job的时候，诡异的事情就来了。你会发现，你在&ldquo;TODO:&rdquo;那里写的reduce逻辑并没有执行，即使没有用 context.write()&nbsp;方法把任何数据输出到HDFS上，Hadoop counter仍然显示该job输出了和reducer输入一样多的数据。<br />
从现象上看，就像是执行了一个默认的Reducer，把reducer的输入数据原样输出。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
所以这里不得不强调，一定不要漏写&nbsp;<span style="color: rgb(255, 0, 0);">@Override</span>&nbsp;注解！有这个注解的时候，IDE就会提示错误，编译就会失败！</p>
<div>
	@Override 注解是可选的，如果删除了它，编译器不会报错，因为在 JAVA 中，重写一个方法时不使用 @Override 注解也是允许的。</div>
<div>
	&nbsp;</div>
<div>
	但是，建议在重写父类或接口中的方法时使用 @Override 注解。这样做有以下几个好处：</div>
<div>
	➤ 增加代码可读性：通过使用 @Override 注解，其他开发人员可以清楚地知道该方法是对父类或接口中的方法进行重写的，代码更易于理解。</div>
<div>
	➤ 防止错误：如果你错误地拼写了要重写的方法名，或者方法签名不正确，编译器会给出错误提示，帮助你发现潜在的问题。</div>
<div>
	➤ 保证代码的健壮性：如果父类或接口中的方法发生了变化，使用 @Override 注解的方法会在编译时产生错误，提醒你需要更新重写的方法。</div>
<p>
	在本文的例子中，如果reduce()方法没有写 @Override 注解，那么当reducer类错误地定义成了extends Reducer&lt;Text, Text, NullWritable, NullWritable&gt;的时候，IDE并不会发现reduce()方法有错，从而让你误以为一切正常。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 解决Map-Reduce job OOM(Java Heap Space)错误的一个方法：调整内存参数</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 19 Jun 2023 05:21:18 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java Heap Space]]></category>
		<category><![CDATA[M-R job]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[调大内存]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13923</guid>

					<description><![CDATA[<p>无论是JAVA M-R job还是Pig M-R job发生Java Heap Space错误，一般情况下，我们要通过定位输入数据里的异常情况再想办法解决，例如，你在程序中对某个key做了GROUP操作，但输入数据中可能该key有大量记录，这就有可能导致job OOM。<br />
这个问题取决于数据的具体情况，以及程序实现逻辑，所以这里就不提了。<br />
本文要说的是：有时候程序实现/输入数据的问题&#8220;不是特别严重&#8221;，我们可以通过调整M-R job的内存参数来解决。<br />
<span id="more-13923"></span><br />
对JAVA M-R job，通过 -D 设置如下参数：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="bash language-bash hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.map.memory.mb=8192&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.reduce.memory.mb=8192&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.map.java.opts=-Xmx6144m&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.reduce.java.opts=-Xmx6144m&#34;</span>&#160;\
</code></pre>
</section>
<p>
对Apache Pig M-R job，在Pig代码中添加如下语句：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.map.memory.mb&#160;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.reduce.memory.mb&#160;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.map.java.opts&#160;-Xmx6144m;</code></pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/" class="read-more">Read More </a></section>]]></description>
										<content:encoded><![CDATA[<p>无论是JAVA M-R job还是Pig M-R job发生Java Heap Space错误，一般情况下，我们要通过定位输入数据里的异常情况再想办法解决，例如，你在程序中对某个key做了GROUP操作，但输入数据中可能该key有大量记录，这就有可能导致job OOM。<br />
这个问题取决于数据的具体情况，以及程序实现逻辑，所以这里就不提了。<br />
本文要说的是：有时候程序实现/输入数据的问题&ldquo;不是特别严重&rdquo;，我们可以通过调整M-R job的内存参数来解决。<br />
<span id="more-13923"></span><br />
对JAVA M-R job，通过 -D 设置如下参数：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="bash language-bash hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.map.memory.mb=8192&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.reduce.memory.mb=8192&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.map.java.opts=-Xmx6144m&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.reduce.java.opts=-Xmx6144m&quot;</span>&nbsp;\
</code></pre>
</section>
<p>
对Apache Pig M-R job，在Pig代码中添加如下语句：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.map.memory.mb&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.reduce.memory.mb&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.map.java.opts&nbsp;-Xmx6144m;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.reduce.java.opts&nbsp;-Xmx6144m;
</code></pre>
</section>
<p>
其中，第1、2个参数需要你根据Hadoop集群的情况自行调整，第3、4个参数设置成第1、2个参数的70%～80%</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 怎样确认当前正在运行的TensorFlow model-serving服务加载的是哪个.pb模型</title>
		<link>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/</link>
					<comments>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 24 May 2023 09:33:49 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[model-serving]]></category>
		<category><![CDATA[pb模型]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13912</guid>

					<description><![CDATA[<p>跑起来一个TensorFlow model-serving服务后，有时候记不清它加载的是哪个.pb模型了，可以采用下面的办法来确认。<br />
<span id="more-13912"></span><br />
访问URL：<br />
http://&#60;your_model_serving_host&#62;:18501/v1/models/&#60;your_model_name&#62;<br />
其中：<br />
&#60;your_model_serving_host&#62; 是你的model-serving服务器的域名或IP。<br />
&#60;your_model_name&#62; 是你的模型名称。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
会看到页面输出类似于下面的内容：</p>
<blockquote>
<div>
		{</div>
<div>
		&#160;&#34;model_version_status&#34;: [</div>
<div>
		&#160; {</div>
<div>
		&#160; &#160;&#34;version&#34;: &#34;1684833957&#34;,</div>
<div>
		&#160; &#160;&#34;state&#34;: &#34;AVAILABLE&#34;,</div>
<div>
		&#160; &#160;&#34;status&#34;: {</div>
<div>
		&#160; &#160; &#34;error_code&#34;: &#34;OK&#34;,</div>
<div>
		&#160; &#160; &#34;error_message&#34;: &#34;&#34;</div>
<div>
		&#160; &#160;}</div>
<div>
		&#160; }</div>
<div>
		&#160;]</div>
<div>
		}</div>
</blockquote>
<p>其中，version就是我们要找的东西。<br />
到你保存.pb模型的父目录下（可能是HDFS或本地磁盘），无脑搜version对应的关键字&#8230; <a href="https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>跑起来一个TensorFlow model-serving服务后，有时候记不清它加载的是哪个.pb模型了，可以采用下面的办法来确认。<br />
<span id="more-13912"></span><br />
访问URL：<br />
http://&lt;your_model_serving_host&gt;:18501/v1/models/&lt;your_model_name&gt;<br />
其中：<br />
&lt;your_model_serving_host&gt; 是你的model-serving服务器的域名或IP。<br />
&lt;your_model_name&gt; 是你的模型名称。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
会看到页面输出类似于下面的内容：</p>
<blockquote>
<div>
		{</div>
<div>
		&nbsp;&quot;model_version_status&quot;: [</div>
<div>
		&nbsp; {</div>
<div>
		&nbsp; &nbsp;&quot;version&quot;: &quot;1684833957&quot;,</div>
<div>
		&nbsp; &nbsp;&quot;state&quot;: &quot;AVAILABLE&quot;,</div>
<div>
		&nbsp; &nbsp;&quot;status&quot;: {</div>
<div>
		&nbsp; &nbsp; &quot;error_code&quot;: &quot;OK&quot;,</div>
<div>
		&nbsp; &nbsp; &quot;error_message&quot;: &quot;&quot;</div>
<div>
		&nbsp; &nbsp;}</div>
<div>
		&nbsp; }</div>
<div>
		&nbsp;]</div>
<div>
		}</div>
</blockquote>
<p>其中，version就是我们要找的东西。<br />
到你保存.pb模型的父目录下（可能是HDFS或本地磁盘），无脑搜version对应的关键字 1684833957，找到哪个目录，就是我们要找的.pb模型所在的目录。<br />
通常这个目录下会有一个&nbsp;saved_model.pb 文件，以及一个&nbsp;variables 子目录。<br />
为什么可以这样做？因为version里的时间戳就是导出 .pb 模型的时间戳，这个时间戳精确到秒，一般情况下，两个模型几乎不太可能在同一秒生成，所以这个时间戳是唯一的，因此只要能找到这个目录名，那么目录里的 .pb 模型几乎肯定是我们要找的模型。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 如何下载HLS流视频文件</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 03 May 2023 10:12:39 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mac]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[下载HLS]]></category>
		<category><![CDATA[下载m3u8]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13899</guid>

					<description><![CDATA[<p>在互联网上，有些视频以HLS流的形式呈现，当你用一些工具捕获到它的播放地址时，会发现是一个以 .m3u8&#160;结尾的URL。<br />
那么，什么是 HLS&#160;和&#160;m3u8&#160;呢？</p>
<blockquote>
<p>
		HLS（HTTP Live Streaming）是一种基于HTTP协议的流媒体传输协议，而M3U8则是一种基于文本的播放列表文件格式。在HLS中，媒体数据被划分成多个小文件进行传输，并使用M3U8文件作为索引来指向这些媒体数据文件。M3U8文件包含了所有的媒体数据文件的URL地址及其相关信息，如码率、分辨率、编码格式等。因此，当一个客户端请求播放一个HLS流时，它将下载对应的M3U8索引文件，并且根据其中包含的地址去下载其他的媒体数据文件。简单来说，HLS和M3U8是两个不同但紧密相连的概念，其中M3U8作为HLS协议中索引与定位资源的重要组成部分。</p>
</blockquote>
<p>问题来了：如何下载HLS流视频文件呢？<br />
<span id="more-13899"></span><br />
有多种方法，下面略举一二。</p>
<p><span style="background-color: rgb(0, 255, 0);">➤</span>&#160;使用Chrome插件：<span style="color:#0000ff;">Video DownloadHelper</span><br />
这个插件可以捕获视频地址，也可以直接下载。但是直接下载HLS流视频每天有次数限制（很久以前是这样，不知道现在是什么情况），所以用此插件直接下载不可取。<br />
那么我们可以用它获取视频地址，再用类似于 <a href="https://github.com/HeiSir2014/M3U8-Downloader" rel="noopener" target="_blank">M3U8-Downloader</a>&#160;这样的桌面软件去下载这个地址指向的视频。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="background-color: rgb(0, 255, 0);">➤</span>&#160;使用跨平台的HLS下载工具&#160;<span style="color:#0000ff;">N_m3u8DL-RE</span><br />
<a href="https://github.com/nilaoda/N_m3u8DL-RE" rel="noopener" target="_blank">N_m3u8DL-RE</a>&#160;是一款跨平台的DASH/HLS/MSS下载工具，功能很强大。<br />
以Ubuntu Linux系统为例，只需简单地下载其release包，解压出来得到一个可执行程序&#160;N_m3u8DL-RE，然后这样用就可以下载HLS流视频了：</p>
<blockquote>
<p>
		./N_m3u8DL-RE &#60;m3u8_url&#62;</p>
</blockquote>
<p>N_m3u8DL-RE&#160;支持的参数非常多，可以参考其文档。<br />
如果首次运行的时候提示没有安装 ffmpeg，可以用 apt install ffmpeg&#160;安装，再运行。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>在互联网上，有些视频以HLS流的形式呈现，当你用一些工具捕获到它的播放地址时，会发现是一个以 .m3u8&nbsp;结尾的URL。<br />
那么，什么是 HLS&nbsp;和&nbsp;m3u8&nbsp;呢？</p>
<blockquote>
<p>
		HLS（HTTP Live Streaming）是一种基于HTTP协议的流媒体传输协议，而M3U8则是一种基于文本的播放列表文件格式。在HLS中，媒体数据被划分成多个小文件进行传输，并使用M3U8文件作为索引来指向这些媒体数据文件。M3U8文件包含了所有的媒体数据文件的URL地址及其相关信息，如码率、分辨率、编码格式等。因此，当一个客户端请求播放一个HLS流时，它将下载对应的M3U8索引文件，并且根据其中包含的地址去下载其他的媒体数据文件。简单来说，HLS和M3U8是两个不同但紧密相连的概念，其中M3U8作为HLS协议中索引与定位资源的重要组成部分。</p>
</blockquote>
<p>问题来了：如何下载HLS流视频文件呢？<br />
<span id="more-13899"></span><br />
有多种方法，下面略举一二。</p>
<p><span style="background-color: rgb(0, 255, 0);">➤</span>&nbsp;使用Chrome插件：<span style="color:#0000ff;">Video DownloadHelper</span><br />
这个插件可以捕获视频地址，也可以直接下载。但是直接下载HLS流视频每天有次数限制（很久以前是这样，不知道现在是什么情况），所以用此插件直接下载不可取。<br />
那么我们可以用它获取视频地址，再用类似于 <a href="https://github.com/HeiSir2014/M3U8-Downloader" rel="noopener" target="_blank">M3U8-Downloader</a>&nbsp;这样的桌面软件去下载这个地址指向的视频。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="background-color: rgb(0, 255, 0);">➤</span>&nbsp;使用跨平台的HLS下载工具&nbsp;<span style="color:#0000ff;">N_m3u8DL-RE</span><br />
<a href="https://github.com/nilaoda/N_m3u8DL-RE" rel="noopener" target="_blank">N_m3u8DL-RE</a>&nbsp;是一款跨平台的DASH/HLS/MSS下载工具，功能很强大。<br />
以Ubuntu Linux系统为例，只需简单地下载其release包，解压出来得到一个可执行程序&nbsp;N_m3u8DL-RE，然后这样用就可以下载HLS流视频了：</p>
<blockquote>
<p>
		./N_m3u8DL-RE &lt;m3u8_url&gt;</p>
</blockquote>
<p>N_m3u8DL-RE&nbsp;支持的参数非常多，可以参考其文档。<br />
如果首次运行的时候提示没有安装 ffmpeg，可以用 apt install ffmpeg&nbsp;安装，再运行。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 用JAVA读取本地的TFRecord文件</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 24 Apr 2023 18:09:06 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<category><![CDATA[TFRecord]]></category>
		<category><![CDATA[本地]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13895</guid>

					<description><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&#160;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;&#160;&#160;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&#160;localTfRecordFile&#160;=&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;/path/to/your/tfrecord/file&#34;</span>;
&#160;&#160;&#160;&#160;InputStream&#160;inputStream&#160;=&#160;Files.newInputStream(Paths.get(localTfRecordFile));
&#160;&#160;&#160;&#160;DataInput&#160;dataInput&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;DataInputStream(inputStream);
&#160;&#160;&#160;&#160;TFRecordReader&#160;reader&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;TFRecordReader(dataInput,&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&#160;&#160;&#160;&#160;byte[]&#160;recordBytes&#160;=&#160;reader.read();
&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&#160;(recordBytes&#160;!=&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;Example&#160;example&#160;=&#160;Example.parseFrom(recordBytes);
&#160;&#160;&#160;&#160;&#160;&#160;System.out.println(example.toString());
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&#160;&#160;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&#160;只打印一个Example</span>
&#160;&#160;&#160;&#160;}
&#160;&#160;&#160;&#160;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&nbsp;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&nbsp;localTfRecordFile&nbsp;=&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;/path/to/your/tfrecord/file&quot;</span>;
&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;inputStream&nbsp;=&nbsp;Files.newInputStream(Paths.get(localTfRecordFile));
&nbsp;&nbsp;&nbsp;&nbsp;DataInput&nbsp;dataInput&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;DataInputStream(inputStream);
&nbsp;&nbsp;&nbsp;&nbsp;TFRecordReader&nbsp;reader&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;TFRecordReader(dataInput,&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&nbsp;&nbsp;&nbsp;&nbsp;byte[]&nbsp;recordBytes&nbsp;=&nbsp;reader.read();
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&nbsp;(recordBytes&nbsp;!=&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Example&nbsp;example&nbsp;=&nbsp;Example.parseFrom(recordBytes);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(example.toString());
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&nbsp;只打印一个Example</span>
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。</p>
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
		感谢关注我的微信公众号（微信扫一扫）：<br />
		<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
		以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
