<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>原创 &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/category/original/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Wed, 25 Mar 2026 09:49:41 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>批量复制历史日期的Hive表</title>
		<link>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/</link>
					<comments>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 25 Mar 2026 09:49:41 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[复制Hive表]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14221</guid>

					<description><![CDATA[<p>如果Hive表的分区是日期，并且每天数据不大，那么如果想批量把某一天的数据复制出多天的数据，最快的方法可能是直接复制磁盘文件，然后再用一行命令处理一下即可。<br />
<span id="more-14221"></span><br />
（1）先找到Hive表所在的HDFS目录，假设我们想用 2026-03-20 的数据复制出 2026-03-21 的数据，则：</p>
<blockquote>
<p>
		hadoop fs -cp /path/to/your/hive/table/hdfs/dir/date=2026-03-20&#160;/path/to/your/hive/table/hdfs/dir/date=2026-03-21</p>
</blockquote>
<p>（2）光复制目录没用，数据仍然是查询不到的，需要用在Hive命令行交互模式下，执行以下命令让复制出来的数据&#34;生效&#34;</p>
<blockquote>
<p>
		msck repair table 表名;</p>
</blockquote>
<p>该命令用于修复表的元数据。<br />
直接在 HDFS 上创建了分区目录，但未通过 ALTER TABLE ADD PARTITION 命令注册到 Hive 元数据中，运行msck命令后，这些分区会被自动发现并添加到元数据。&#8230; <a href="https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>如果Hive表的分区是日期，并且每天数据不大，那么如果想批量把某一天的数据复制出多天的数据，最快的方法可能是直接复制磁盘文件，然后再用一行命令处理一下即可。<br />
<span id="more-14221"></span><br />
（1）先找到Hive表所在的HDFS目录，假设我们想用 2026-03-20 的数据复制出 2026-03-21 的数据，则：</p>
<blockquote>
<p>
		hadoop fs -cp /path/to/your/hive/table/hdfs/dir/date=2026-03-20&nbsp;/path/to/your/hive/table/hdfs/dir/date=2026-03-21</p>
</blockquote>
<p>（2）光复制目录没用，数据仍然是查询不到的，需要用在Hive命令行交互模式下，执行以下命令让复制出来的数据&quot;生效&quot;</p>
<blockquote>
<p>
		msck repair table 表名;</p>
</blockquote>
<p>该命令用于修复表的元数据。<br />
直接在 HDFS 上创建了分区目录，但未通过 ALTER TABLE ADD PARTITION 命令注册到 Hive 元数据中，运行msck命令后，这些分区会被自动发现并添加到元数据。</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e6%89%b9%e9%87%8f%e5%a4%8d%e5%88%b6%e5%8e%86%e5%8f%b2%e6%97%a5%e6%9c%9f%e7%9a%84hive%e8%a1%a8/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 美化git diff命令在终端的显示效果</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 09 Mar 2026 03:20:04 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[git diff]]></category>
		<category><![CDATA[git-delta]]></category>
		<category><![CDATA[左右双屏]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14209</guid>

					<description><![CDATA[<p>本文适用的环境：<br />
MacOS、Ubuntu（仅在20.04.6 LTS上做了测试）<br />
git diff命令在终端执行时，其显示效果是：<br />
* 按文件分块：每个有改动的文件单独一段，从上到下依次展示。<br />
* 统一在一个终端窗口内纵向滚动，默认不会并排显示左右对比。<br />
我个人觉得，这种显示方式，不如&#34;左右对比&#34;的diff形式直观。<br />
所以，有没有办法把git diff命令的输出，改造成更美观的形式呢？<br />
<span id="more-14209"></span><br />
在MacOS下，可以安装 git-delta，再稍加配置，就可以让终端里的 git diff&#160;命令显式美观得多。<br />
先看最终效果：<br />
<img decoding="async" alt="git diff" src="https://www.codelast.com/wp-content/uploads/2026/03/git_diff_style_change.jpg" style="width: 700px; height: 259px;" /><br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<p>这个效果是怎么实现的呢？按如下步骤操作&#8212;&#8212;<br />
以MacOS为例，安装：</p>
<blockquote>
<p>
		brew install git-delta</p>
</blockquote>
<p>全局配置到git，修改 ~/.gitconfig&#160;文件，添加如下内容：</p>
<blockquote>
<div>
		[core]</div>
<div>
		&#160; &#160; pager = delta</div>
<div>
		[interactive]</div>
<div>
		&#160; &#160; diffFilter = delta --color-only</div>
<div>
		[delta]</div>
<div>
		&#160; &#160; syntax-theme = Monokai Extended</div>
<div>
		&#160; &#160; line-numbers = true</div>
<div>
		&#160; &#160; side-by-side = true</div>
</blockquote>
<div>
	各配置项含义如下：
<div class="document">
<div class="section">
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[core] pager = delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">把 Git 的&#8220;分页器&#8221;改成 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响的命令：如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git diff</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">、</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git log -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;"> 等需要分页显示的输出</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">作用：这些命令的输出不再通过 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">less</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">，而是先经过 delta 进行美化后再显示</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[interactive] diffFilter = delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">专门给交互式操作（如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git add -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">）设置 diff 过滤器</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Git 在交互式展示每一块 diff 时，先把原始 diff 丢给 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">-color-only</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">：只加颜色高亮，不改行号、不改文本结构，确保交互命令正常工作</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] syntax-theme = Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">设置 delta 的语法高亮主题为 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响代码内容的配色风格（关键字、字符串、注释等的颜色方案）</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] line-numbers = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">在 delta 输出中展示行号</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">一般会在左侧或边栏显示老/新文件的行号，方便定位</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] side-by-side = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">使用左右并排对比模式显示 diff</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &#34;Times New Roman&#34;;">&#160;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">左侧通常是旧版本，右侧是新版本，效果类似 GitHub PR 的对比视图</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
</div>
</div>
<p>
	其中，你可以用&#160;delta --list-syntax-themes&#160;命令查看所有内置主题，并设置到 syntax-theme&#160;配置项中。</p>
<p>	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>本文适用的环境：<br />
MacOS、Ubuntu（仅在20.04.6 LTS上做了测试）<br />
git diff命令在终端执行时，其显示效果是：<br />
* 按文件分块：每个有改动的文件单独一段，从上到下依次展示。<br />
* 统一在一个终端窗口内纵向滚动，默认不会并排显示左右对比。<br />
我个人觉得，这种显示方式，不如&quot;左右对比&quot;的diff形式直观。<br />
所以，有没有办法把git diff命令的输出，改造成更美观的形式呢？<br />
<span id="more-14209"></span><br />
在MacOS下，可以安装 git-delta，再稍加配置，就可以让终端里的 git diff&nbsp;命令显式美观得多。<br />
先看最终效果：<br />
<img decoding="async" alt="git diff" src="https://www.codelast.com/wp-content/uploads/2026/03/git_diff_style_change.jpg" style="width: 700px; height: 259px;" /><br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<p>这个效果是怎么实现的呢？按如下步骤操作&mdash;&mdash;<br />
以MacOS为例，安装：</p>
<blockquote>
<p>
		brew install git-delta</p>
</blockquote>
<p>全局配置到git，修改 ~/.gitconfig&nbsp;文件，添加如下内容：</p>
<blockquote>
<div>
		[core]</div>
<div>
		&nbsp; &nbsp; pager = delta</div>
<div>
		[interactive]</div>
<div>
		&nbsp; &nbsp; diffFilter = delta --color-only</div>
<div>
		[delta]</div>
<div>
		&nbsp; &nbsp; syntax-theme = Monokai Extended</div>
<div>
		&nbsp; &nbsp; line-numbers = true</div>
<div>
		&nbsp; &nbsp; side-by-side = true</div>
</blockquote>
<div>
	各配置项含义如下：</p>
<div class="document">
<div class="section">
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[core] pager = delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">把 Git 的&ldquo;分页器&rdquo;改成 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响的命令：如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git diff</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">、</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git log -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;"> 等需要分页显示的输出</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">作用：这些命令的输出不再通过 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">less</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">，而是先经过 delta 进行美化后再显示</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[interactive] diffFilter = delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">专门给交互式操作（如 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">git add -p</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">）设置 diff 过滤器</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Git 在交互式展示每一块 diff 时，先把原始 diff 丢给 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">delta --color-only</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">-color-only</span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">：只加颜色高亮，不改行号、不改文本结构，确保交互命令正常工作</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] syntax-theme = Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">设置 delta 的语法高亮主题为 </span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">Monokai Extended</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">影响代码内容的配色风格（关键字、字符串、注释等的颜色方案）</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] line-numbers = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">在 delta 输出中展示行号</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">一般会在左侧或边栏显示老/新文件的行号，方便定位</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt; font-size: 12pt; text-align: justify; font-family: 等线;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">●<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; font-weight: bold; letter-spacing: 0pt; vertical-align: baseline;">[delta] side-by-side = true</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">使用左右并排对比模式显示 diff</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
<p class="paragraph text-align-type-justify MsoNormal" style="margin: 3pt 0pt 3pt 3.52727em; font-size: 12pt; text-align: justify; font-family: 等线; text-indent: -16.8pt;">
				<span style="font-family: Wingdings;"><span style="font-size: 11pt; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;">○<span style="font-family: &quot;Times New Roman&quot;;">&nbsp;</span></span></span><span data-font-family="微软雅黑" style="font-size: 11pt; font-family: 微软雅黑; letter-spacing: 0pt; vertical-align: baseline;">左侧通常是旧版本，右侧是新版本，效果类似 GitHub PR 的对比视图</span><span lang="EN-US" style="font-size: 11pt; font-family: 微软雅黑; color: rgb(51, 51, 51); letter-spacing: 0pt; vertical-align: baseline;"><o:p></o:p></span></p>
</p></div>
</p></div>
<p>
	其中，你可以用&nbsp;delta --list-syntax-themes&nbsp;命令查看所有内置主题，并设置到 syntax-theme&nbsp;配置项中。</p>
<p>	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
<div>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%be%8e%e5%8c%96git-diff%e5%91%bd%e4%bb%a4%e5%9c%a8%e7%bb%88%e7%ab%af%e7%9a%84%e6%98%be%e7%a4%ba%e6%95%88%e6%9e%9c/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 用Whisper.cpp在本地(离线)把mp3音频转成中文</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 17 Sep 2025 14:29:50 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[mp3转中文]]></category>
		<category><![CDATA[OpenAI]]></category>
		<category><![CDATA[whisper.cpp]]></category>
		<category><![CDATA[Whisper模型]]></category>
		<category><![CDATA[语音识别]]></category>
		<category><![CDATA[语音转文字]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14183</guid>

					<description><![CDATA[<p>如果你有把mp3音频转成中文文本、并且不想调用任何云端API的需求，那么本文提供了一个可行的方案。<br />
OS：Ubuntu 20.04 LTS（MacOS类似可用）<br />
<span id="more-14183"></span></p>
<div style="text-align: center;">
	<img decoding="async" alt="audio to text" src="https://www.codelast.com/wp-content/uploads/2025/09/audio_to_text.png" style="width: 600px; height: 528px;" /></div>
<p>
我们首先要知道几个背景知识：<br />
<span style="color:#ff0000;">➤</span> whisper.cpp 是一个开源的 C/C++ 实现，用于运行 OpenAI 的 Whisper 模型。Whisper 是一种先进的自动语音识别（Automatic Speech Recognition, ASR）神经网络模型，能够将音频转换为文本。它支持多种语言，识别效果精准，并且可以完全离线运行，无需互联网连接。这个项目特别适合嵌入到各种应用程序中，因为它是轻量级的实现，可以在 CPU 上高效运行。<br />
<span style="color: rgb(255, 0, 0);">➤</span>&#160;whisper.cpp-cli 是对 whisper.cpp 命令行工具的 Python 封装。<br />
<span style="color:#ff0000;">➤</span>&#160;OpenAI 的 Whisper 模型是一个先进的自动语音识别系统。它是基于 Transformer 架构的神经网络模型，主要用于将音频转换为文本。Whisper 由 OpenAI 开源开发，使用了大规模的多语言数据集进行训练（包括 68 万小时的音频数据，支持 98 种语言），因此具有出色的准确性和鲁棒性，能够处理多种口音、背景噪音和技术术语。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>如果你有把mp3音频转成中文文本、并且不想调用任何云端API的需求，那么本文提供了一个可行的方案。<br />
OS：Ubuntu 20.04 LTS（MacOS类似可用）<br />
<span id="more-14183"></span></p>
<div style="text-align: center;">
	<img decoding="async" alt="audio to text" src="https://www.codelast.com/wp-content/uploads/2025/09/audio_to_text.png" style="width: 600px; height: 528px;" /></div>
<p>
我们首先要知道几个背景知识：<br />
<span style="color:#ff0000;">➤</span> whisper.cpp 是一个开源的 C/C++ 实现，用于运行 OpenAI 的 Whisper 模型。Whisper 是一种先进的自动语音识别（Automatic Speech Recognition, ASR）神经网络模型，能够将音频转换为文本。它支持多种语言，识别效果精准，并且可以完全离线运行，无需互联网连接。这个项目特别适合嵌入到各种应用程序中，因为它是轻量级的实现，可以在 CPU 上高效运行。<br />
<span style="color: rgb(255, 0, 0);">➤</span>&nbsp;whisper.cpp-cli 是对 whisper.cpp 命令行工具的 Python 封装。<br />
<span style="color:#ff0000;">➤</span>&nbsp;OpenAI 的 Whisper 模型是一个先进的自动语音识别系统。它是基于 Transformer 架构的神经网络模型，主要用于将音频转换为文本。Whisper 由 OpenAI 开源开发，使用了大规模的多语言数据集进行训练（包括 68 万小时的音频数据，支持 98 种语言），因此具有出色的准确性和鲁棒性，能够处理多种口音、背景噪音和技术术语。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
因此，我们需要安装的是 <a href="https://github.com/charliermarsh/whisper.cpp-cli" target="_blank">whisper.cpp-cli</a>，并且要下载好 Whisper 模型文件，这样就能使用 whisper.cpp-cli 来调用Whisper模型文件进行语音识别了。<br />
为了不影响系统里安装的软件，我们通常都会用micromamba、uv之类的Python包管理器创建一个新的env（环境），然后在里面再安装其他的Python包。这里我们用uv，你也可以用其他的包管理器来创建env。</p>
<blockquote>
<p>
		mkdir whisper<br />
		cd whisper<br />
		uv venv . --python 3.8&nbsp; # 创建一个新环境<br />
		source&nbsp;bin/activate&nbsp; # 激活环境<br />
		uv pip install pip&nbsp; # 安装pip<br />
		pip install whisper.cpp-cli&nbsp; # 安装whisper.cpp-cli</p>
</blockquote>
<p>
这样我们就安装好了相关软件。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
把 mp3 转成 16 kHz 的 wav 文件：</p>
<blockquote>
<p>
		ffmpeg -y -i /home/codelast/st.mp3 -ar 16000 /home/codelast/st.wav</p>
</blockquote>
<p>注意，16 kHz 是Whisper要求的。</p>
<p>然后我们就可以开始做语音识别了：</p>
<blockquote>
<p>
		&nbsp;whisper-cpp -m ../whisper.cpp/download/x-ggml-model.zh.bin -f /home/codelast/st.wav -l zh --output-txt</p>
</blockquote>
<p>这个命令的参数：<br />
-m：指定使用 x-ggml-model.zh.bin 这个模型来做语音识别（模型文件要提前从 Hugging Face 上下载好）<br />
-f：指定输入文件，对 st.wav 这个音频文件进行语音识别<br />
-l：指定输出语言为中文<br />
--output-txt：直接输出txt文本。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
输出内容示例如下：</p>
<blockquote>
<div>
		whisper_init_from_file_with_params_no_state: loading model from &#39;../whisper.cpp/download/x-ggml-model.zh.bin&#39;</div>
<div>
		whisper_model_load: loading model</div>
<div>
		whisper_model_load: n_vocab&nbsp; &nbsp; &nbsp; &nbsp;= 51865</div>
<div>
		whisper_model_load: n_audio_ctx&nbsp; &nbsp;= 1500</div>
<div>
		whisper_model_load: n_audio_state = 768</div>
<div>
		whisper_model_load: n_audio_head&nbsp; = 12</div>
<div>
		whisper_model_load: n_audio_layer = 12</div>
<div>
		whisper_model_load: n_text_ctx&nbsp; &nbsp; = 448</div>
<div>
		whisper_model_load: n_text_state&nbsp; = 768</div>
<div>
		whisper_model_load: n_text_head&nbsp; &nbsp;= 12</div>
<div>
		whisper_model_load: n_text_layer&nbsp; = 12</div>
<div>
		whisper_model_load: n_mels&nbsp; &nbsp; &nbsp; &nbsp; = 80</div>
<div>
		whisper_model_load: ftype&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 1</div>
<div>
		whisper_model_load: qntvr&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 0</div>
<div>
		whisper_model_load: type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 3 (small)</div>
<div>
		whisper_model_load: adding 1608 extra tokens</div>
<div>
		whisper_model_load: n_langs&nbsp; &nbsp; &nbsp; &nbsp;= 99</div>
<div>
		whisper_model_load:&nbsp; &nbsp; &nbsp; CPU total size =&nbsp; &nbsp;487.01 MB</div>
<div>
		whisper_model_load: model size&nbsp; &nbsp; =&nbsp; 487.01 MB</div>
<div>
		whisper_init_state: kv self size&nbsp; =&nbsp; &nbsp;49.55 MB</div>
<div>
		whisper_init_state: kv cross size =&nbsp; &nbsp;55.30 MB</div>
<div>
		whisper_init_state: compute buffer (conv)&nbsp; &nbsp;=&nbsp; &nbsp;22.54 MB</div>
<div>
		&nbsp;whisper_init_state: compute buffer (encode) =&nbsp; 2&nbsp;80.20 MB</div>
<div>
		whisper_init_state: compute buffer (cross)&nbsp; =&nbsp; &nbsp; 6.31 MB</div>
<div>
		whisper_init_state: compute buffer (decode) =&nbsp; &nbsp;97.40 MB</div>
<div>
		&nbsp;</div>
<div>
		system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0</div>
<div>
		&nbsp;</div>
<div>
		main: processing &#39;/home/codelast/st.wav&#39; (23957094 samples, 1497.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = zh, task = transcribe, timestamps = 1 ...</div>
<div>
		&nbsp;</div>
<div>
		&nbsp;</div>
<div>
		[00:00:00.000 --&gt; 00:00:19.040]&nbsp; 这里是识别文字第一句</div>
<div>
		[00:00:19.040 --&gt; 00:00:40.280]&nbsp; 这里是识别文字第二句<br />
		......</div>
</blockquote>
<div>
	<br />
	可以看到，输出的内容是带有时间轴标识的。<br />
	细看会发现，Whisper输出的中文有时会有错别字，我们可以用AI进一步纠错，想怎么处理你就可以随意发挥了。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
<div>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8whisper-cpp%e5%9c%a8%e6%9c%ac%e5%9c%b0%e7%a6%bb%e7%ba%bf%e6%8a%8amp3%e9%9f%b3%e9%a2%91%e8%bd%ac%e6%88%90%e4%b8%ad%e6%96%87/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 如何判断已经启动的TF-Serving服务是否正在使用</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 16 Sep 2024 04:27:03 +0000</pubDate>
				<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[TF-Serving]]></category>
		<category><![CDATA[TFServing]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14136</guid>

					<description><![CDATA[<p>在一台服务器上，如果启动了一个TF-Serving服务，我们知道它占了资源，却不知道它是在空跑还是<span style="color:#ff0000;">真的在用</span>。<br />
本文描述了怎样判断它是否真的在用。<br />
<span id="more-14136"></span></p>
<div>
	用 nvidia-smi 命令能看到 TF-Serving 服务在运行：</div>
<p><img decoding="async" alt="TF-Serving is running" src="https://www.codelast.com/wp-content/uploads/2024/09/tf_serving_running.png" style="width: 700px; height: 149px;" /></p>
<div>
<div>
		其进程id是 22871，于是进一步查询这个进程的信息：</div>
<blockquote>
<div>
			ps -ef &#124; grep 22871</div>
</blockquote>
<div>
		输出类似于：</div>
<blockquote>
<div>
			root&#160; &#160; &#160;22871 22729 83 13:42 pts/0&#160; &#160; 00:06:35 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=codelast --model_base_path=/models/codelast</div>
</blockquote>
<div>
		可见其REST服务的端口号为 8501。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>
<div>
			于是我们可以用 tcpdump 来捕获并分析流量，运行以下命令(需要 root 权限)：：</div>
<blockquote>
<div>
				sudo tcpdump -vv -i any &#39;port 8501&#39;</div>
</blockquote>
<div>
			如果有客户端正在向这个TF-Serving服务发送请求，我们应会看到这个命令有输出，不断在刷屏，类似于：
<div>
				<span style="color:#0000ff;">14:27:59.174425 IP (tos 0x0, ttl 60, id 51707, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.60679</span></div></div></div></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<p>在一台服务器上，如果启动了一个TF-Serving服务，我们知道它占了资源，却不知道它是在空跑还是<span style="color:#ff0000;">真的在用</span>。<br />
本文描述了怎样判断它是否真的在用。<br />
<span id="more-14136"></span></p>
<div>
	用 nvidia-smi 命令能看到 TF-Serving 服务在运行：</div>
<p><img decoding="async" alt="TF-Serving is running" src="https://www.codelast.com/wp-content/uploads/2024/09/tf_serving_running.png" style="width: 700px; height: 149px;" /></p>
<div>
<div>
		其进程id是 22871，于是进一步查询这个进程的信息：</div>
<blockquote>
<div>
			ps -ef | grep 22871</div>
</blockquote>
<div>
		输出类似于：</div>
<blockquote>
<div>
			root&nbsp; &nbsp; &nbsp;22871 22729 83 13:42 pts/0&nbsp; &nbsp; 00:06:35 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=codelast --model_base_path=/models/codelast</div>
</blockquote>
<div>
		可见其REST服务的端口号为 8501。<br />
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p>
<div>
			于是我们可以用 tcpdump 来捕获并分析流量，运行以下命令(需要 root 权限)：：</div>
<blockquote>
<div>
				sudo tcpdump -vv -i any &#39;port 8501&#39;</div>
</blockquote>
<div>
			如果有客户端正在向这个TF-Serving服务发送请求，我们应会看到这个命令有输出，不断在刷屏，类似于：</p>
<div>
				<span style="color:#0000ff;">14:27:59.174425 IP (tos 0x0, ttl 60, id 51707, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.60679 &gt; 172.17.0.2.cmtp-mgt: Flags [.], cksum 0x310f (correct), seq 617580:619040, ack 1, win 63, length 1460</span></div>
<div>
				<span style="color:#0000ff;">14:27:59.174453 IP (tos 0x0, ttl 60, id 39347, offset 0, flags [DF], proto TCP (6), length 1500)</span></div>
<div>
				<span style="color:#0000ff;">node.codelast.com.32739 &gt; 172.17.0.2.cmtp-mgt: Flags [.], cksum 0x9354 (correct), seq 44268904:44270364, ack 1, win 86, length 1460</span></div>
<p>			如果没有请求发到TF-Serving服务，那么上面的命令什么都不会输出，就表明TF-Serving服务没在用。<br />
			<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
			<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
			转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
			感谢关注我的微信公众号（微信扫一扫）：<br />
			<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
			以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
				<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</p></div>
<p>
		&nbsp;</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%88%a4%e6%96%ad%e5%b7%b2%e7%bb%8f%e5%90%af%e5%8a%a8%e7%9a%84tf-serving%e6%9c%8d%e5%8a%a1%e6%98%af%e5%90%a6%e6%ad%a3%e5%9c%a8%e4%bd%bf%e7%94%a8/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] JAVA map-reduce job的counter页面无法显示的问题(error 500)</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Tue, 30 Apr 2024 09:11:34 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[counter]]></category>
		<category><![CDATA[error 500]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[map-reduce]]></category>
		<category><![CDATA[RFC 2616]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14101</guid>

					<description><![CDATA[<p>这个问题已经不是第一次遇到了，只不过今天痛下决心花费不少时间把它写成文章，给遇到同样问题的朋友一些参考。<br />
我们知道，一个JAVA M-R job跑完后，无论是在命令行，还是在job的信息展示网页上，都会看到输出job counter的信息。在网页上，通过点击job信息页中的&#34;counter&#34;链接就能看到。<br />
<span id="more-14101"></span><br />
<img decoding="async" alt="hadoop job info page" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_job_info_page.png" style="width: 339px; height: 361px;" /></p>
<p>本来嘛，点进这个页面，会看到正常的counter数据，但是，出问题的情况下，点进去看到的是这种情况：<br />
<img decoding="async" alt="hadoop counter info error" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_counter_error.png" style="width: 561px; height: 144px;" /><br />
同时，在shell命令行你也会发现，该job没有任何counter信息输出。<br />
从错误信息页上，你得不到关于错误的任何有效信息，那个&#34;Error Details&#34;里也没有。<br />
虽然counter无法显示，但M-R job是可以正常跑完、正常输出数据的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
经过测试，我这个case的的问题是：在程序中添加了数量比较多的Hadoop counter造成的。<br />
什么算多？我不知道。我的程序里有240多个counter就出问题了，当我把counter缩减了一半，最终只有120多个counter的时候，counter信息统计就正常了。<br />
如果你遇到了类似问题，可以首先检查一下job中的counter数量是否太多。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#160;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" />&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>这个问题已经不是第一次遇到了，只不过今天痛下决心花费不少时间把它写成文章，给遇到同样问题的朋友一些参考。<br />
我们知道，一个JAVA M-R job跑完后，无论是在命令行，还是在job的信息展示网页上，都会看到输出job counter的信息。在网页上，通过点击job信息页中的&quot;counter&quot;链接就能看到。<br />
<span id="more-14101"></span><br />
<img decoding="async" alt="hadoop job info page" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_job_info_page.png" style="width: 339px; height: 361px;" /></p>
<p>本来嘛，点进这个页面，会看到正常的counter数据，但是，出问题的情况下，点进去看到的是这种情况：<br />
<img decoding="async" alt="hadoop counter info error" src="https://www.codelast.com/wp-content/uploads/2024/04/hadoop_counter_error.png" style="width: 561px; height: 144px;" /><br />
同时，在shell命令行你也会发现，该job没有任何counter信息输出。<br />
从错误信息页上，你得不到关于错误的任何有效信息，那个&quot;Error Details&quot;里也没有。<br />
虽然counter无法显示，但M-R job是可以正常跑完、正常输出数据的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
经过测试，我这个case的的问题是：在程序中添加了数量比较多的Hadoop counter造成的。<br />
什么算多？我不知道。我的程序里有240多个counter就出问题了，当我把counter缩减了一半，最终只有120多个counter的时候，counter信息统计就正常了。<br />
如果你遇到了类似问题，可以首先检查一下job中的counter数量是否太多。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e7%9a%84counter%e9%a1%b5%e9%9d%a2%e6%97%a0%e6%b3%95%e6%98%be%e7%a4%ba%e7%9a%84%e9%97%ae%e9%a2%98error-500/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 解决IntelliJ idea启动崩溃：error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=...</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Fri, 15 Mar 2024 09:48:13 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[IntelliJ IDEA]]></category>
		<category><![CDATA[SIGABRT]]></category>
		<category><![CDATA[启动]]></category>
		<category><![CDATA[崩溃]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14081</guid>

					<description><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="" src="https://www.codelast.com/wp-content/uploads/2024/03/intellij_idea_logo.jpeg" style="width: 225px; height: 225px;" /></div>
<p>有时候，一个用了好久、一直好用的方法突然失灵，并且还折腾了几天，真的会逼疯人。<br />
前几天我就遇到了这种破事：在Ubuntu开发机上自己升级IntelliJ idea到最新版之后，就无法再启动它。<br />
启动时永远会崩溃，无论是重启系统、删除IntelliJ idea的本地缓存，或者使用回旧版，都无法再启动它（仿佛什么文件被&#34;污染&#34;了，再也回不去了），十分烦人。经过几天各种方法的尝试，终于解决了问题，我的解决办法不具有普适性，但如果你遇到了此类问题，或许可以为你提供一些解决思路。<br />
<span id="more-14081"></span><br />
OS：<span style="color:#0000ff;">Ubuntu 20.04.6 LTS</span><br />
JDK：<span style="color:#0000ff;">1.8.0_382</span><br />
原来安装的IntelliJ idea版本：<span style="color:#b22222;">idea-IC-232.8660.185</span><br />
从JetBrains官网上下载的新版IntelliJ idea版本：<span style="color:#b22222;">idea-IC-233.14808.21</span><br />
我当时不是利用IDE里的升级功能来升级的，而是自己下载了新版的压缩包，解压出来一个&#160;idea-IC-233.14808.21 目录，直接进入 bin 目录下执行 idea.sh 来跑的新版。众所周知，这样跑起来之后，新版会自动把旧版里的配置引入进来，只要没有问题，是可以无缝切换到新版不需要重新配置的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
然而，启动新版的时候挂了，最后看到命令行报错：</p>
<blockquote>
<div>
		[error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=0x00007fed3c5cf00b]</div>
<div>
		Aborted (core dumped)</div>
</blockquote>
<div>
	进不了IDE主界面。同时在/home目录下生成了一个内容超长的错误报告文件 java_error_in_idea_xxx.log</div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="" src="https://www.codelast.com/wp-content/uploads/2024/03/intellij_idea_logo.jpeg" style="width: 225px; height: 225px;" /></div>
<p>有时候，一个用了好久、一直好用的方法突然失灵，并且还折腾了几天，真的会逼疯人。<br />
前几天我就遇到了这种破事：在Ubuntu开发机上自己升级IntelliJ idea到最新版之后，就无法再启动它。<br />
启动时永远会崩溃，无论是重启系统、删除IntelliJ idea的本地缓存，或者使用回旧版，都无法再启动它（仿佛什么文件被&quot;污染&quot;了，再也回不去了），十分烦人。经过几天各种方法的尝试，终于解决了问题，我的解决办法不具有普适性，但如果你遇到了此类问题，或许可以为你提供一些解决思路。<br />
<span id="more-14081"></span><br />
OS：<span style="color:#0000ff;">Ubuntu 20.04.6 LTS</span><br />
JDK：<span style="color:#0000ff;">1.8.0_382</span><br />
原来安装的IntelliJ idea版本：<span style="color:#b22222;">idea-IC-232.8660.185</span><br />
从JetBrains官网上下载的新版IntelliJ idea版本：<span style="color:#b22222;">idea-IC-233.14808.21</span><br />
我当时不是利用IDE里的升级功能来升级的，而是自己下载了新版的压缩包，解压出来一个&nbsp;idea-IC-233.14808.21 目录，直接进入 bin 目录下执行 idea.sh 来跑的新版。众所周知，这样跑起来之后，新版会自动把旧版里的配置引入进来，只要没有问题，是可以无缝切换到新版不需要重新配置的。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
然而，启动新版的时候挂了，最后看到命令行报错：</p>
<blockquote>
<div>
		[error occurred during error reporting (), id 0x6, SIGABRT (0x6) at pc=0x00007fed3c5cf00b]</div>
<div>
		Aborted (core dumped)</div>
</blockquote>
<div>
	进不了IDE主界面。同时在/home目录下生成了一个内容超长的错误报告文件 java_error_in_idea_xxx.log<br />
	我一开始并没有看这个log文件，而是按网上搜到的方法，分别尝试了：<br />
	1、重启系统<br />
	2、删除IntelliJ idea的缓存<br />
	3、使用回旧版IntelliJ idea<br />
	4、仿照<a href="https://youtrack.jetbrains.com/issue/IDEA-315192/IntelliJ-would-not-open-after-being-closed-once-on-Ubuntu-22.04-LTS.-The-only-solution-is-rebooting." rel="noopener" target="_blank">这个</a>类似的问题，卸载了snap又重新安装<br />
	以上方法都没用。<br />
	实在没辙了，只能硬着头皮看崩溃产生的日志文件&nbsp;java_error_in_idea_xxx.log，没想到一看就发现了端倪。<br />
	开头有一段内容是：</p>
<div>
<blockquote>
<div>
				# Problematic frame:</div>
<div>
				# C&nbsp; [x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09]&nbsp; tree_sitter_cpp_external_scanner_deserialize+0x179</div>
</blockquote>
<div>
<div>
				虽然我不知道它是什么确切的意思，但是这里写的是&quot;问题帧&quot;，说明崩溃和它有关。<br />
				<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
				再往下看日志，会看到：</p>
<blockquote>
<div>
						---------------&nbsp; T H R E A D&nbsp; ---------------</div>
<div>
						&nbsp;</div>
<div>
						Current thread (0x00007fec7c02b370):&nbsp; JavaThread &quot;AWT-EventQueue-0&quot; [_thread_in_native, id=352672, stack(0x00007feb95ae5000,0x00007feb95be6000)]</div>
<div>
						&nbsp;</div>
<div>
						Stack: [0x00007feb95ae5000,0x00007feb95be6000],&nbsp; sp=0x00007feb95be0310,&nbsp; free space=1004k</div>
<div>
						Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09]&nbsp; tree_sitter_cpp_external_scanner_deserialize+0x179</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0x30b3e]&nbsp; ts_parser_reset+0x30e</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0x2e329]&nbsp; ts_parser_set_language+0x399</div>
<div>
						C&nbsp; [x86_64-linux-gnu-tree-sitter.so+0xb4875]&nbsp; Java_org_treesitter_TSParser_ts_1parser_1set_1language+0x25</div>
<div>
						j&nbsp; org.treesitter.TSParser.ts_parser_set_language(JJ)Z+0</div>
<div>
						j&nbsp; org.treesitter.TSParser.setLanguage(Lorg/treesitter/TSLanguage;)Z+10</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.a()V+64</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.&lt;init&gt;()V+365</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.d.a()Lai/codegeex/plugin/lang/agent/d;+10</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.CodegeexAgentCompletionService.e()V+0</div>
<div>
						j&nbsp; ai.codegeex.plugin.lang.agent.CodegeexAgentCompletionService.&lt;init&gt;()V+266</div>
</blockquote>
<div>
<div>
						可见，和出问题的&quot;x86_64-linux-gnu-tree-sitter-cpp.so+0x38ec09&quot;有关系的插件，第一个出现的就是&quot;ai.codegeex.plugin&quot;，这个对应的就是我安装的CodeGeeX插件。</div>
<div>
						所以我怀疑，删除这个插件可以解决IntelliJ idea启动崩溃的问题。</div>
<div>
						在Ubuntu系统上，插件安装在这个目录下：<span style="color:#0000ff;">~/.local/share/JetBrains/IdeaIC2023.3</span></div>
<div>
						其中，IdeaIC2023.3是IntelliJ idea的版本号，每升级一个版本，~/.local/share/JetBrains 目录下都会生成一个新的目录。</div>
<div>
						在这个目录下，会看到有一个名为&quot;CodeGeeX&quot;的目录，这个目录就是CodeGeeX插件的安装目录，删除它即可。</div>
<div>
						然后再次尝试启动IntelliJ idea，发现已经可以正常启动了。<br />
						虽然我现在还不知道为什么CodeGeeX插件会引起这个问题，但是如果你像我一样，实在找不到IDE崩溃的原因时，删除可能有问题的插件或许是解决问题的一个办法。<br />
						<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
						<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
						转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
						感谢关注我的微信公众号（微信扫一扫）：<br />
						<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
						以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
							<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</p></div>
</p></div>
</p></div>
</p></div>
</p></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3intellij-idea%e5%90%af%e5%8a%a8%e5%b4%a9%e6%ba%83%ef%bc%9aerror-occurred-during-error-reporting-id-0x6-sigabrt-0x6-at-pc/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 和付费使用一年多的GitHub Copilot说再见</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/#comments</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Fri, 01 Mar 2024 19:16:53 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[AI辅助编程]]></category>
		<category><![CDATA[CodeGeeX]]></category>
		<category><![CDATA[GitHub Copilot]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=14064</guid>

					<description><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_1.png" style="width: 800px; height: 213px;" /></div>
<div>
	&#160;</div>
<div>
	昨天，我的GitHub Copilot订阅到期了。付费使用了一年多，现在也决定不再续费，颇有些感受。<br />
	&#160;</div>
<div>
	从付费之前的热切期盼，到使用过程中的逐渐习惯，再到付费结束时的&#34;从容分手&#34;，我终究还是向现实投降，选择了穷人的活法。<br />
	&#160;</div>
<div>
	毕竟一个月10美元的费用，说它值或不值都可以找出充分的理由，只不过于我而言，GitHub Copilot已经不再有$10/月的吸引力罢了。<br />
	<span id="more-14064"></span></div>
<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_2.png" style="width: 800px; height: 309px;" /></div>
<div>
<!--more--></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot作为世界上第一款效果一流的AI辅助编程工具，是在2022年9月份正式上线的。之前，我和很多同行一样，时不时困在&#34;写代码&#8594;某些片段忘了怎么写&#8594;搜Google&#8594;复制粘贴网上的代码测试&#8594;继续写代码&#34;的循环中。这种熟悉而又重复的感觉长时间下来会给人积累不少负能量。<br />
	&#160;</div>
<div>
	直到GitHub Copilot出现，在科技媒体的渲染、宣传下，以及民间艺人的自测报告加持下，它被赋予了一个响当当的名字：牛B！<br />
	&#160;</div>
<div>
	于是我心动了。在试用了一个月，又继续付费体验了一个月之后，GitHub Copilot给我的震动让我相信：它一定能在开发过程中为我节省海量时间。于是在2023年初，我下定决心要续一年的费。<br />
	&#160;</div>
<div>
	$10/月的费用，对很多开发者来说可能要下很大决心才能下手。当时我账户上有一个优惠，以90多美元的价格续了一年的费，也就是不到700人民币一年。<br />
	&#160;</div>
<div>
	GitHub Copilot代码补全的准确度令人印象深刻。我觉得最爽的一点就是：它补全中文注释的结果令我十分满意。无论是补全class头部的比较长的注释，还是在写代码的过程中，补全一行的那种注释，我都觉得它能&#34;想我所想，写我想写&#34;。<br />
	&#160;</div>
<div>
	当然也有最不爽的一点，就是它连接服务器时不时会卡顿&#8212;&#8212;服务器在国外，可以理解。<br />
	&#160;</div>
<div style="text-align: center;">
	<img decoding="async" alt="alternatives" src="https://www.codelast.com/wp-content/uploads/2024/03/alternative.jpg" style="width: 750px; height: 320px;" /></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	在2023年初那时，如果你想找到一个GitHub Copilot的免费版&#34;平替&#34;，那么选择并不多。国内的aiXcoder、CodeGeeX算是比较知名的其二。</div>
<div style="text-align: center;">
	<img decoding="async" alt="aiXcoder" src="https://www.codelast.com/wp-content/uploads/2024/03/aixcoder.jpg" style="width: 360px; height: 147px;" /></div>
<div>
	aiXcoder的最初几个版本我一直觉得它的设计是真的&#34;有病&#34;&#8212;&#8212;需要在本地安装一个后端软件来做inference。由于和系统相关，这显然会导致在很多Linux发行版上会用不了。比如我曾经在Ubuntu 16.04上尝试过安装aiXcoder的本地推理软件，无奈由于依赖库的问题装不上，到官方QQ群里反馈问题，官方的开发也只是确认了问题，却不给任何解决方案。想必这种没有任何KPI又赚不了一毛钱的事情，鬼才会给你开发！<br />
	&#160;</div>
<div>
	所以我毫不犹豫地抛弃了aiXcoder。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot那种&#34;一个插件解决所有问题&#34;、&#34;推理在云端完成&#34;的机制，就基本避开了系统版本的差异，没有了依赖库的问题。<br />
	&#160;</div>
<div style="text-align: center;">
	<img decoding="async" alt="CodeGeeX" src="https://www.codelast.com/wp-content/uploads/2024/03/codegeex.png" style="width: 727px; height: 153px;" /></div>
<div>
	<br />
	在2023年初，国产的CodeGeeX算是辅助编程领域的另一个耀眼明星。它和GitHub Copilot一样，inference都在云端完成，安装一个插件搞定一切&#8212;&#8212;这才像是一个正常的辅助编程软件应有的样子。<br />
	&#160;</div>
<div>
	我当时在大概10个case上详细对比过GitHub Copilot和CodeGeeX的效果，结论当然不出意外：GitHub Copilot全面碾压CodeGeeX&#8212;&#8212;这里不是指在某些技术指标上进行对比(比如用于评估代码生成质量的测试集等)，而是纯粹从个人的直观感受上看二者的输出谁更好。<br />
	&#160;</div>
<div>
	所以，实话实说，在2023年初的时候，经过我个人的测试，我宁可选择花700块这么大一笔费用去买GitHub Copilot，也不愿每天频繁使用免费的CodeGeeX，因为它当时的代码补全效果确实不太行，而且对某些编程语言的支持也很菜（比如Apache Pig），会影响我的开发工作。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_1.png" style="width: 800px; height: 213px;" /></div>
<div>
	&nbsp;</div>
<div>
	昨天，我的GitHub Copilot订阅到期了。付费使用了一年多，现在也决定不再续费，颇有些感受。<br />
	&nbsp;</div>
<div>
	从付费之前的热切期盼，到使用过程中的逐渐习惯，再到付费结束时的&quot;从容分手&quot;，我终究还是向现实投降，选择了穷人的活法。<br />
	&nbsp;</div>
<div>
	毕竟一个月10美元的费用，说它值或不值都可以找出充分的理由，只不过于我而言，GitHub Copilot已经不再有$10/月的吸引力罢了。<br />
	<span id="more-14064"></span></div>
<div style="text-align: center;">
	<img decoding="async" alt="GitHub CoPilot" src="https://www.codelast.com/wp-content/uploads/2024/03/github_copilot_2.png" style="width: 800px; height: 309px;" /></div>
<div>
<!--more--></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot作为世界上第一款效果一流的AI辅助编程工具，是在2022年9月份正式上线的。之前，我和很多同行一样，时不时困在&quot;写代码&rarr;某些片段忘了怎么写&rarr;搜Google&rarr;复制粘贴网上的代码测试&rarr;继续写代码&quot;的循环中。这种熟悉而又重复的感觉长时间下来会给人积累不少负能量。<br />
	&nbsp;</div>
<div>
	直到GitHub Copilot出现，在科技媒体的渲染、宣传下，以及民间艺人的自测报告加持下，它被赋予了一个响当当的名字：牛B！<br />
	&nbsp;</div>
<div>
	于是我心动了。在试用了一个月，又继续付费体验了一个月之后，GitHub Copilot给我的震动让我相信：它一定能在开发过程中为我节省海量时间。于是在2023年初，我下定决心要续一年的费。<br />
	&nbsp;</div>
<div>
	$10/月的费用，对很多开发者来说可能要下很大决心才能下手。当时我账户上有一个优惠，以90多美元的价格续了一年的费，也就是不到700人民币一年。<br />
	&nbsp;</div>
<div>
	GitHub Copilot代码补全的准确度令人印象深刻。我觉得最爽的一点就是：它补全中文注释的结果令我十分满意。无论是补全class头部的比较长的注释，还是在写代码的过程中，补全一行的那种注释，我都觉得它能&quot;想我所想，写我想写&quot;。<br />
	&nbsp;</div>
<div>
	当然也有最不爽的一点，就是它连接服务器时不时会卡顿&mdash;&mdash;服务器在国外，可以理解。<br />
	&nbsp;</div>
<div style="text-align: center;">
	<img decoding="async" alt="alternatives" src="https://www.codelast.com/wp-content/uploads/2024/03/alternative.jpg" style="width: 750px; height: 320px;" /></div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	在2023年初那时，如果你想找到一个GitHub Copilot的免费版&quot;平替&quot;，那么选择并不多。国内的aiXcoder、CodeGeeX算是比较知名的其二。</div>
<div style="text-align: center;">
	<img decoding="async" alt="aiXcoder" src="https://www.codelast.com/wp-content/uploads/2024/03/aixcoder.jpg" style="width: 360px; height: 147px;" /></div>
<div>
	aiXcoder的最初几个版本我一直觉得它的设计是真的&quot;有病&quot;&mdash;&mdash;需要在本地安装一个后端软件来做inference。由于和系统相关，这显然会导致在很多Linux发行版上会用不了。比如我曾经在Ubuntu 16.04上尝试过安装aiXcoder的本地推理软件，无奈由于依赖库的问题装不上，到官方QQ群里反馈问题，官方的开发也只是确认了问题，却不给任何解决方案。想必这种没有任何KPI又赚不了一毛钱的事情，鬼才会给你开发！<br />
	&nbsp;</div>
<div>
	所以我毫不犹豫地抛弃了aiXcoder。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	GitHub Copilot那种&quot;一个插件解决所有问题&quot;、&quot;推理在云端完成&quot;的机制，就基本避开了系统版本的差异，没有了依赖库的问题。<br />
	&nbsp;</div>
<div style="text-align: center;">
	<img decoding="async" alt="CodeGeeX" src="https://www.codelast.com/wp-content/uploads/2024/03/codegeex.png" style="width: 727px; height: 153px;" /></div>
<div>
	<br />
	在2023年初，国产的CodeGeeX算是辅助编程领域的另一个耀眼明星。它和GitHub Copilot一样，inference都在云端完成，安装一个插件搞定一切&mdash;&mdash;这才像是一个正常的辅助编程软件应有的样子。<br />
	&nbsp;</div>
<div>
	我当时在大概10个case上详细对比过GitHub Copilot和CodeGeeX的效果，结论当然不出意外：GitHub Copilot全面碾压CodeGeeX&mdash;&mdash;这里不是指在某些技术指标上进行对比(比如用于评估代码生成质量的测试集等)，而是纯粹从个人的直观感受上看二者的输出谁更好。<br />
	&nbsp;</div>
<div>
	所以，实话实说，在2023年初的时候，经过我个人的测试，我宁可选择花700块这么大一笔费用去买GitHub Copilot，也不愿每天频繁使用免费的CodeGeeX，因为它当时的代码补全效果确实不太行，而且对某些编程语言的支持也很菜（比如Apache Pig），会影响我的开发工作。</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></div>
<div>
	然而，在2023年一整年，CodeGeeX做了令人印象深刻的重大改进，不仅代码补全的质量提高很多，而且还增加了大量新功能，三言两语无法说完。因此，在2024年的今天，当我的GitHub Copilot需要再次付费的时候，除非它降价到原来的10%（我当然知道这不可能），否则我是不会再续费了，我会选择用免费的国产平替：CodeGeeX。<br />
	&nbsp;</div>
<div>
	2023年至今，除了CodeGeeX的巨大进步之外，市场上还出现了大量免费竞品，包括Codeium（国外），Fitten Code（国产）等等，它们虽然可能比GitHub Copilot还有差距，但是你要相信：只要你不是特别挑剔，日常使用绝对够了。</div>
<div>
	&nbsp;</div>
<div>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%92%8c%e4%bb%98%e8%b4%b9%e4%bd%bf%e7%94%a8%e4%b8%80%e5%b9%b4%e5%a4%9a%e7%9a%84-github-copilot-%e8%af%b4%e5%86%8d%e8%a7%81/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 在Apache Pig中把数据按指定字段分组，每组取时间最新的一条记录</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 15 Nov 2023 08:15:25 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[GROUP]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13967</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig处理大数据时，经常会有这种需求：把输入数据按指定的字段group，并且每个group内只输出时间最新的一条记录。<br />
<span id="more-13967"></span><br />
举个例子。有数据文件 input.txt ：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">10&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2,3
9&#160;&#160;&#160;&#160;&#160;&#160;&#160;b&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2
8&#160;&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;2,3,4
13&#160;&#160;&#160;&#160;&#160;&#160;a&#160;&#160;&#160;&#160;&#160;&#160;&#160;1,2,3,4
6&#160;&#160;&#160;&#160;&#160;&#160;&#160;b&#160;&#160;&#160;&#160;&#160;&#160;&#160;1
</code></pre>
</section>
<p>该数据的三个字段分别代表：<span style="background-color: rgb(255, 255, 255); color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre;">time（时间戳），userId（用户id），userInterest（用户兴趣id）<br />
现在，要找出每个用户时间最新的</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">应该怎么做？</span><br />
<span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">即：对用户 a，最新的时间戳是13，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2,3,4；对用户 b，最新的时间戳是9，</span><span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2。</span><br />
<span style="color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">直接上代码：</span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;input.txt&#39;</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&#160;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">time</span>:&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">long</span>,&#160;userId:&#160;chararray,&#160;userInterest:&#160;chararray);
A&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;time,&#160;userId,&#160;userInterest;
B&#160;=&#160;GROUP&#160;A&#160;BY&#160;userId;
<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">--&#160;每个userId取时间最新的一条记录</span>
C&#160;=&#160;FOREACH&#160;B&#160;{
&#160;&#160;&#160;&#160;SORTED&#160;=&#160;ORDER&#160;A&#160;BY&#160;time&#160;DESC;
&#160;&#160;&#160;&#160;ONE_RECORD&#160;=&#160;LIMIT&#160;SORTED&#160;1;
&#160;&#160;&#160;&#160;GENERATE&#160;FLATTEN(ONE_RECORD);
};
DUMP&#160;C;
</code></pre>
</section>
<p>
在嵌套的FOREACH语句中，首先用ORDER BY对同一个group内的数据进行了降序排序，再用LIMIT取一条记录，由于是按time降序排序，因此LIMIT 1取到的就是时间戳最大的那条记录，即时间最新的记录。<br />
<span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">输出：</span></p>
<blockquote>
<div>
		(13,a,1,2,3,4)</div>
<div>
		(9,b,1,2)</div>
</blockquote>
<p><span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &#34;Droid Sans Mono&#34;, &#34;monospace&#34;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">可见这个结果和我们前面人工判断出来的正确结果一致。</span></p>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig处理大数据时，经常会有这种需求：把输入数据按指定的字段group，并且每个group内只输出时间最新的一条记录。<br />
<span id="more-13967"></span><br />
举个例子。有数据文件 input.txt ：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2,3
9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2
8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2,3,4
13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,2,3,4
6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1
</code></pre>
</section>
<p>该数据的三个字段分别代表：<span style="background-color: rgb(255, 255, 255); color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre;">time（时间戳），userId（用户id），userInterest（用户兴趣id）<br />
现在，要找出每个用户时间最新的</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">应该怎么做？</span><br />
<span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">即：对用户 a，最新的时间戳是13，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2,3,4；对用户 b，最新的时间戳是9，</span><span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">userInterest是1,2。</span><br />
<span style="color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; font-size: 16px; white-space: pre; background-color: rgb(255, 255, 255);">直接上代码：</span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;input.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">time</span>:&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">long</span>,&nbsp;userId:&nbsp;chararray,&nbsp;userInterest:&nbsp;chararray);
A&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;time,&nbsp;userId,&nbsp;userInterest;
B&nbsp;=&nbsp;GROUP&nbsp;A&nbsp;BY&nbsp;userId;
<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">--&nbsp;每个userId取时间最新的一条记录</span>
C&nbsp;=&nbsp;FOREACH&nbsp;B&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;SORTED&nbsp;=&nbsp;ORDER&nbsp;A&nbsp;BY&nbsp;time&nbsp;DESC;
&nbsp;&nbsp;&nbsp;&nbsp;ONE_RECORD&nbsp;=&nbsp;LIMIT&nbsp;SORTED&nbsp;1;
&nbsp;&nbsp;&nbsp;&nbsp;GENERATE&nbsp;FLATTEN(ONE_RECORD);
};
DUMP&nbsp;C;
</code></pre>
</section>
<p>
在嵌套的FOREACH语句中，首先用ORDER BY对同一个group内的数据进行了降序排序，再用LIMIT取一条记录，由于是按time降序排序，因此LIMIT 1取到的就是时间戳最大的那条记录，即时间最新的记录。<br />
<span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">输出：</span></p>
<blockquote>
<div>
		(13,a,1,2,3,4)</div>
<div>
		(9,b,1,2)</div>
</blockquote>
<p><span style="font-size: 16px; color: rgb(59, 59, 59); font-family: &quot;Droid Sans Mono&quot;, &quot;monospace&quot;, monospace; white-space: pre; background-color: rgb(255, 255, 255);">可见这个结果和我们前面人工判断出来的正确结果一致。</span></p>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%95%b0%e6%8d%ae%e6%8c%89%e6%8c%87%e5%ae%9a%e5%ad%97%e6%ae%b5%e5%88%86%e7%bb%84%ef%bc%8c%e6%af%8f%e7%bb%84%e5%8f%96%e6%97%b6%e9%97%b4/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 在Apache Pig中把时间字符串转换成时间戳</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Thu, 12 Oct 2023 09:37:25 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[时间字符串]]></category>
		<category><![CDATA[时间戳]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13959</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" target="_blank" rel="noopener"><span style="background-color:#ffa07a;">这里</span></a>。</p>
<p>在Apache Pig中，怎样把 <span style="color:#ff0000;">2023-10-11_10:57:56</span> 这种格式的时间字符串，转成整型的时间戳？<br />
话不多说，直接上代码。<br />
假设输入数据文件 1.txt，其格式是一行一个时间字符串。<br />
<span id="more-13959"></span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&#160;(dt:&#160;chararray);
A&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;ToDate(dt,&#160;&#39;yyyy-MM-dd_HH:mm:ss&#39;)&#160;AS&#160;date;
B&#160;=&#160;FOREACH&#160;A&#160;GENERATE&#160;ToUnixTime(date)&#160;AS&#160;ts;
DUMP&#160;B;
</code></pre>
</section>
<p>
输出结果形如：</p>
<blockquote>
<p>
		1696993076</p>
</blockquote>
<p>
可见，这样得到的时间戳单位是&#8220;秒&#8221;。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#160;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" />&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" target="_blank" rel="noopener"><span style="background-color:#ffa07a;">这里</span></a>。</p>
<p>在Apache Pig中，怎样把 <span style="color:#ff0000;">2023-10-11_10:57:56</span> 这种格式的时间字符串，转成整型的时间戳？<br />
话不多说，直接上代码。<br />
假设输入数据文件 1.txt，其格式是一行一个时间字符串。<br />
<span id="more-13959"></span></p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(dt:&nbsp;chararray);
A&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;ToDate(dt,&nbsp;&#39;yyyy-MM-dd_HH:mm:ss&#39;)&nbsp;AS&nbsp;date;
B&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;ToUnixTime(date)&nbsp;AS&nbsp;ts;
DUMP&nbsp;B;
</code></pre>
</section>
<p>
输出结果形如：</p>
<blockquote>
<p>
		1696993076</p>
</blockquote>
<p>
可见，这样得到的时间戳单位是&ldquo;秒&rdquo;。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：<br />
<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%9c%a8apache-pig%e4%b8%ad%e6%8a%8a%e6%97%b6%e9%97%b4%e5%ad%97%e7%ac%a6%e4%b8%b2%e8%bd%ac%e6%8d%a2%e6%88%90%e6%97%b6%e9%97%b4%e6%88%b3/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] JAVA map-reduce job中，reduce()方法漏写 @Override 注解引起的问题</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sun, 06 Aug 2023 12:12:10 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[map-reduce job]]></category>
		<category><![CDATA[类型错误]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13934</guid>

					<description><![CDATA[<p>有一个JAVA写的map-reduce&#160;job，mapper输出的key、value类型分别为Text、NullWritable，所以reducer应该像下面这样写：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">static</span>&#160;<span class="hljs-class" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">class</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">QuerySegmentResultFromKVReducer</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">extends</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Reducer</span>&#60;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Text</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>&#62;&#160;</span>{

&#160;&#160;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&#160;&#160;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">setup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&#160;context)</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&#160;IOException,&#160;InterruptedException&#160;</span>{
&#160;&#160;}

&#160;&#160;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&#160;&#160;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&#160;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">cleanup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&#160;context)</span></span></code></pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/" class="read-more">Read More </a></section>]]></description>
										<content:encoded><![CDATA[<p>有一个JAVA写的map-reduce&nbsp;job，mapper输出的key、value类型分别为Text、NullWritable，所以reducer应该像下面这样写：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">static</span>&nbsp;<span class="hljs-class" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">class</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">QuerySegmentResultFromKVReducer</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">extends</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Reducer</span>&lt;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Text</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>,&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">NullWritable</span>&gt;&nbsp;</span>{

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">setup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;}

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">cleanup</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Reducer.Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;}

&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); overflow-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">protected</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">void</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">reduce</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Text&nbsp;key,&nbsp;Iterable&lt;NullWritable&gt;&nbsp;values,&nbsp;Context&nbsp;context)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException,&nbsp;InterruptedException&nbsp;</span>{
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//<span class="hljs-doctag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">TODO:</span></span>
&nbsp;&nbsp;}
}
</code></pre>
</section>
<p>在这里，reducer输出的key、value类型都是NullWritable，我们不用关心，这不是本文的关注点。<br />
<span id="more-13934"></span><br />
如果reduce()方法漏掉了&nbsp;<span style="color:#ff0000;">@Override</span>&nbsp;注解，并且把&nbsp; Reducer&lt;Text, NullWritable, NullWritable, NullWritable&gt;&nbsp;错误地写成了&nbsp;Reducer&lt;Text, Text, NullWritable, NullWritable&gt;，会发现编译并不报错。<br />
但是，当你跑这个job的时候，诡异的事情就来了。你会发现，你在&ldquo;TODO:&rdquo;那里写的reduce逻辑并没有执行，即使没有用 context.write()&nbsp;方法把任何数据输出到HDFS上，Hadoop counter仍然显示该job输出了和reducer输入一样多的数据。<br />
从现象上看，就像是执行了一个默认的Reducer，把reducer的输入数据原样输出。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
所以这里不得不强调，一定不要漏写&nbsp;<span style="color: rgb(255, 0, 0);">@Override</span>&nbsp;注解！有这个注解的时候，IDE就会提示错误，编译就会失败！</p>
<div>
	@Override 注解是可选的，如果删除了它，编译器不会报错，因为在 JAVA 中，重写一个方法时不使用 @Override 注解也是允许的。</div>
<div>
	&nbsp;</div>
<div>
	但是，建议在重写父类或接口中的方法时使用 @Override 注解。这样做有以下几个好处：</div>
<div>
	➤ 增加代码可读性：通过使用 @Override 注解，其他开发人员可以清楚地知道该方法是对父类或接口中的方法进行重写的，代码更易于理解。</div>
<div>
	➤ 防止错误：如果你错误地拼写了要重写的方法名，或者方法签名不正确，编译器会给出错误提示，帮助你发现潜在的问题。</div>
<div>
	➤ 保证代码的健壮性：如果父类或接口中的方法发生了变化，使用 @Override 注解的方法会在编译时产生错误，提醒你需要更新重写的方法。</div>
<p>
	在本文的例子中，如果reduce()方法没有写 @Override 注解，那么当reducer类错误地定义成了extends Reducer&lt;Text, Text, NullWritable, NullWritable&gt;的时候，IDE并不会发现reduce()方法有错，从而让你误以为一切正常。<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-java-map-reduce-job%e4%b8%ad%ef%bc%8creduce%e6%96%b9%e6%b3%95%e6%bc%8f%e5%86%99-override-%e6%b3%a8%e8%a7%a3%e5%bc%95%e8%b5%b7%e7%9a%84%e9%97%ae%e9%a2%98/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 解决Map-Reduce job OOM(Java Heap Space)错误的一个方法：调整内存参数</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 19 Jun 2023 05:21:18 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java Heap Space]]></category>
		<category><![CDATA[M-R job]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[调大内存]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13923</guid>

					<description><![CDATA[<p>无论是JAVA M-R job还是Pig M-R job发生Java Heap Space错误，一般情况下，我们要通过定位输入数据里的异常情况再想办法解决，例如，你在程序中对某个key做了GROUP操作，但输入数据中可能该key有大量记录，这就有可能导致job OOM。<br />
这个问题取决于数据的具体情况，以及程序实现逻辑，所以这里就不提了。<br />
本文要说的是：有时候程序实现/输入数据的问题&#8220;不是特别严重&#8221;，我们可以通过调整M-R job的内存参数来解决。<br />
<span id="more-13923"></span><br />
对JAVA M-R job，通过 -D 设置如下参数：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="bash language-bash hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.map.memory.mb=8192&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.reduce.memory.mb=8192&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.map.java.opts=-Xmx6144m&#34;</span>&#160;\
&#160;&#160;-D&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;mapreduce.reduce.java.opts=-Xmx6144m&#34;</span>&#160;\
</code></pre>
</section>
<p>
对Apache Pig M-R job，在Pig代码中添加如下语句：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.map.memory.mb&#160;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.reduce.memory.mb&#160;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&#160;mapreduce.map.java.opts&#160;-Xmx6144m;</code></pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/" class="read-more">Read More </a></section>]]></description>
										<content:encoded><![CDATA[<p>无论是JAVA M-R job还是Pig M-R job发生Java Heap Space错误，一般情况下，我们要通过定位输入数据里的异常情况再想办法解决，例如，你在程序中对某个key做了GROUP操作，但输入数据中可能该key有大量记录，这就有可能导致job OOM。<br />
这个问题取决于数据的具体情况，以及程序实现逻辑，所以这里就不提了。<br />
本文要说的是：有时候程序实现/输入数据的问题&ldquo;不是特别严重&rdquo;，我们可以通过调整M-R job的内存参数来解决。<br />
<span id="more-13923"></span><br />
对JAVA M-R job，通过 -D 设置如下参数：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="bash language-bash hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.map.memory.mb=8192&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.reduce.memory.mb=8192&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.map.java.opts=-Xmx6144m&quot;</span>&nbsp;\
&nbsp;&nbsp;-D&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;mapreduce.reduce.java.opts=-Xmx6144m&quot;</span>&nbsp;\
</code></pre>
</section>
<p>
对Apache Pig M-R job，在Pig代码中添加如下语句：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.map.memory.mb&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.reduce.memory.mb&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">8192</span>;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.map.java.opts&nbsp;-Xmx6144m;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">SET</span>&nbsp;mapreduce.reduce.java.opts&nbsp;-Xmx6144m;
</code></pre>
</section>
<p>
其中，第1、2个参数需要你根据Hadoop集群的情况自行调整，第3、4个参数设置成第1、2个参数的70%～80%</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%a7%a3%e5%86%b3map-reduce-job-oomjava-heap-space%e9%94%99%e8%af%af%e7%9a%84%e4%b8%80%e4%b8%aa%e6%96%b9%e6%b3%95%ef%bc%9a%e8%b0%83%e6%95%b4%e5%86%85%e5%ad%98%e5%8f%82%e6%95%b0/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 怎样确认当前正在运行的TensorFlow model-serving服务加载的是哪个.pb模型</title>
		<link>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/</link>
					<comments>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 24 May 2023 09:33:49 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[model-serving]]></category>
		<category><![CDATA[pb模型]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13912</guid>

					<description><![CDATA[<p>跑起来一个TensorFlow model-serving服务后，有时候记不清它加载的是哪个.pb模型了，可以采用下面的办法来确认。<br />
<span id="more-13912"></span><br />
访问URL：<br />
http://&#60;your_model_serving_host&#62;:18501/v1/models/&#60;your_model_name&#62;<br />
其中：<br />
&#60;your_model_serving_host&#62; 是你的model-serving服务器的域名或IP。<br />
&#60;your_model_name&#62; 是你的模型名称。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
会看到页面输出类似于下面的内容：</p>
<blockquote>
<div>
		{</div>
<div>
		&#160;&#34;model_version_status&#34;: [</div>
<div>
		&#160; {</div>
<div>
		&#160; &#160;&#34;version&#34;: &#34;1684833957&#34;,</div>
<div>
		&#160; &#160;&#34;state&#34;: &#34;AVAILABLE&#34;,</div>
<div>
		&#160; &#160;&#34;status&#34;: {</div>
<div>
		&#160; &#160; &#34;error_code&#34;: &#34;OK&#34;,</div>
<div>
		&#160; &#160; &#34;error_message&#34;: &#34;&#34;</div>
<div>
		&#160; &#160;}</div>
<div>
		&#160; }</div>
<div>
		&#160;]</div>
<div>
		}</div>
</blockquote>
<p>其中，version就是我们要找的东西。<br />
到你保存.pb模型的父目录下（可能是HDFS或本地磁盘），无脑搜version对应的关键字&#8230; <a href="https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>跑起来一个TensorFlow model-serving服务后，有时候记不清它加载的是哪个.pb模型了，可以采用下面的办法来确认。<br />
<span id="more-13912"></span><br />
访问URL：<br />
http://&lt;your_model_serving_host&gt;:18501/v1/models/&lt;your_model_name&gt;<br />
其中：<br />
&lt;your_model_serving_host&gt; 是你的model-serving服务器的域名或IP。<br />
&lt;your_model_name&gt; 是你的模型名称。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
会看到页面输出类似于下面的内容：</p>
<blockquote>
<div>
		{</div>
<div>
		&nbsp;&quot;model_version_status&quot;: [</div>
<div>
		&nbsp; {</div>
<div>
		&nbsp; &nbsp;&quot;version&quot;: &quot;1684833957&quot;,</div>
<div>
		&nbsp; &nbsp;&quot;state&quot;: &quot;AVAILABLE&quot;,</div>
<div>
		&nbsp; &nbsp;&quot;status&quot;: {</div>
<div>
		&nbsp; &nbsp; &quot;error_code&quot;: &quot;OK&quot;,</div>
<div>
		&nbsp; &nbsp; &quot;error_message&quot;: &quot;&quot;</div>
<div>
		&nbsp; &nbsp;}</div>
<div>
		&nbsp; }</div>
<div>
		&nbsp;]</div>
<div>
		}</div>
</blockquote>
<p>其中，version就是我们要找的东西。<br />
到你保存.pb模型的父目录下（可能是HDFS或本地磁盘），无脑搜version对应的关键字 1684833957，找到哪个目录，就是我们要找的.pb模型所在的目录。<br />
通常这个目录下会有一个&nbsp;saved_model.pb 文件，以及一个&nbsp;variables 子目录。<br />
为什么可以这样做？因为version里的时间戳就是导出 .pb 模型的时间戳，这个时间戳精确到秒，一般情况下，两个模型几乎不太可能在同一秒生成，所以这个时间戳是唯一的，因此只要能找到这个目录名，那么目录里的 .pb 模型几乎肯定是我们要找的模型。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e6%80%8e%e6%a0%b7%e7%a1%ae%e8%ae%a4%e5%bd%93%e5%89%8d%e6%ad%a3%e5%9c%a8%e8%bf%90%e8%a1%8c%e7%9a%84tensorflow-model-serving%e6%9c%8d%e5%8a%a1%e5%8a%a0%e8%bd%bd%e7%9a%84%e6%98%af%e5%93%aa%e4%b8%aa-pb/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 如何下载HLS流视频文件</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Wed, 03 May 2023 10:12:39 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mac]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[下载HLS]]></category>
		<category><![CDATA[下载m3u8]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13899</guid>

					<description><![CDATA[<p>在互联网上，有些视频以HLS流的形式呈现，当你用一些工具捕获到它的播放地址时，会发现是一个以 .m3u8&#160;结尾的URL。<br />
那么，什么是 HLS&#160;和&#160;m3u8&#160;呢？</p>
<blockquote>
<p>
		HLS（HTTP Live Streaming）是一种基于HTTP协议的流媒体传输协议，而M3U8则是一种基于文本的播放列表文件格式。在HLS中，媒体数据被划分成多个小文件进行传输，并使用M3U8文件作为索引来指向这些媒体数据文件。M3U8文件包含了所有的媒体数据文件的URL地址及其相关信息，如码率、分辨率、编码格式等。因此，当一个客户端请求播放一个HLS流时，它将下载对应的M3U8索引文件，并且根据其中包含的地址去下载其他的媒体数据文件。简单来说，HLS和M3U8是两个不同但紧密相连的概念，其中M3U8作为HLS协议中索引与定位资源的重要组成部分。</p>
</blockquote>
<p>问题来了：如何下载HLS流视频文件呢？<br />
<span id="more-13899"></span><br />
有多种方法，下面略举一二。</p>
<p><span style="background-color: rgb(0, 255, 0);">➤</span>&#160;使用Chrome插件：<span style="color:#0000ff;">Video DownloadHelper</span><br />
这个插件可以捕获视频地址，也可以直接下载。但是直接下载HLS流视频每天有次数限制（很久以前是这样，不知道现在是什么情况），所以用此插件直接下载不可取。<br />
那么我们可以用它获取视频地址，再用类似于 <a href="https://github.com/HeiSir2014/M3U8-Downloader" rel="noopener" target="_blank">M3U8-Downloader</a>&#160;这样的桌面软件去下载这个地址指向的视频。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="background-color: rgb(0, 255, 0);">➤</span>&#160;使用跨平台的HLS下载工具&#160;<span style="color:#0000ff;">N_m3u8DL-RE</span><br />
<a href="https://github.com/nilaoda/N_m3u8DL-RE" rel="noopener" target="_blank">N_m3u8DL-RE</a>&#160;是一款跨平台的DASH/HLS/MSS下载工具，功能很强大。<br />
以Ubuntu Linux系统为例，只需简单地下载其release包，解压出来得到一个可执行程序&#160;N_m3u8DL-RE，然后这样用就可以下载HLS流视频了：</p>
<blockquote>
<p>
		./N_m3u8DL-RE &#60;m3u8_url&#62;</p>
</blockquote>
<p>N_m3u8DL-RE&#160;支持的参数非常多，可以参考其文档。<br />
如果首次运行的时候提示没有安装 ffmpeg，可以用 apt install ffmpeg&#160;安装，再运行。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>在互联网上，有些视频以HLS流的形式呈现，当你用一些工具捕获到它的播放地址时，会发现是一个以 .m3u8&nbsp;结尾的URL。<br />
那么，什么是 HLS&nbsp;和&nbsp;m3u8&nbsp;呢？</p>
<blockquote>
<p>
		HLS（HTTP Live Streaming）是一种基于HTTP协议的流媒体传输协议，而M3U8则是一种基于文本的播放列表文件格式。在HLS中，媒体数据被划分成多个小文件进行传输，并使用M3U8文件作为索引来指向这些媒体数据文件。M3U8文件包含了所有的媒体数据文件的URL地址及其相关信息，如码率、分辨率、编码格式等。因此，当一个客户端请求播放一个HLS流时，它将下载对应的M3U8索引文件，并且根据其中包含的地址去下载其他的媒体数据文件。简单来说，HLS和M3U8是两个不同但紧密相连的概念，其中M3U8作为HLS协议中索引与定位资源的重要组成部分。</p>
</blockquote>
<p>问题来了：如何下载HLS流视频文件呢？<br />
<span id="more-13899"></span><br />
有多种方法，下面略举一二。</p>
<p><span style="background-color: rgb(0, 255, 0);">➤</span>&nbsp;使用Chrome插件：<span style="color:#0000ff;">Video DownloadHelper</span><br />
这个插件可以捕获视频地址，也可以直接下载。但是直接下载HLS流视频每天有次数限制（很久以前是这样，不知道现在是什么情况），所以用此插件直接下载不可取。<br />
那么我们可以用它获取视频地址，再用类似于 <a href="https://github.com/HeiSir2014/M3U8-Downloader" rel="noopener" target="_blank">M3U8-Downloader</a>&nbsp;这样的桌面软件去下载这个地址指向的视频。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="background-color: rgb(0, 255, 0);">➤</span>&nbsp;使用跨平台的HLS下载工具&nbsp;<span style="color:#0000ff;">N_m3u8DL-RE</span><br />
<a href="https://github.com/nilaoda/N_m3u8DL-RE" rel="noopener" target="_blank">N_m3u8DL-RE</a>&nbsp;是一款跨平台的DASH/HLS/MSS下载工具，功能很强大。<br />
以Ubuntu Linux系统为例，只需简单地下载其release包，解压出来得到一个可执行程序&nbsp;N_m3u8DL-RE，然后这样用就可以下载HLS流视频了：</p>
<blockquote>
<p>
		./N_m3u8DL-RE &lt;m3u8_url&gt;</p>
</blockquote>
<p>N_m3u8DL-RE&nbsp;支持的参数非常多，可以参考其文档。<br />
如果首次运行的时候提示没有安装 ffmpeg，可以用 apt install ffmpeg&nbsp;安装，再运行。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e4%b8%8b%e8%bd%bdhls%e6%b5%81%e8%a7%86%e9%a2%91%e6%96%87%e4%bb%b6/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 用JAVA读取本地的TFRecord文件</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 24 Apr 2023 18:09:06 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<category><![CDATA[TFRecord]]></category>
		<category><![CDATA[本地]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13895</guid>

					<description><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&#160;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;&#160;&#160;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&#160;localTfRecordFile&#160;=&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;/path/to/your/tfrecord/file&#34;</span>;
&#160;&#160;&#160;&#160;InputStream&#160;inputStream&#160;=&#160;Files.newInputStream(Paths.get(localTfRecordFile));
&#160;&#160;&#160;&#160;DataInput&#160;dataInput&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;DataInputStream(inputStream);
&#160;&#160;&#160;&#160;TFRecordReader&#160;reader&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;TFRecordReader(dataInput,&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&#160;&#160;&#160;&#160;byte[]&#160;recordBytes&#160;=&#160;reader.read();
&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&#160;(recordBytes&#160;!=&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;Example&#160;example&#160;=&#160;Example.parseFrom(recordBytes);
&#160;&#160;&#160;&#160;&#160;&#160;System.out.println(example.toString());
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&#160;&#160;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&#160;只打印一个Example</span>
&#160;&#160;&#160;&#160;}
&#160;&#160;&#160;&#160;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&nbsp;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&nbsp;localTfRecordFile&nbsp;=&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;/path/to/your/tfrecord/file&quot;</span>;
&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;inputStream&nbsp;=&nbsp;Files.newInputStream(Paths.get(localTfRecordFile));
&nbsp;&nbsp;&nbsp;&nbsp;DataInput&nbsp;dataInput&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;DataInputStream(inputStream);
&nbsp;&nbsp;&nbsp;&nbsp;TFRecordReader&nbsp;reader&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;TFRecordReader(dataInput,&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&nbsp;&nbsp;&nbsp;&nbsp;byte[]&nbsp;recordBytes&nbsp;=&nbsp;reader.read();
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&nbsp;(recordBytes&nbsp;!=&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Example&nbsp;example&nbsp;=&nbsp;Example.parseFrom(recordBytes);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(example.toString());
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&nbsp;只打印一个Example</span>
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。</p>
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
		感谢关注我的微信公众号（微信扫一扫）：<br />
		<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
		以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 设置Emacs打开一个文件时的&quot;文件太大&quot;警告阈值</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%ae%be%e7%bd%aeemacs%e6%89%93%e5%bc%80%e4%b8%80%e4%b8%aa%e6%96%87%e4%bb%b6%e6%97%b6%e7%9a%84%e6%96%87%e4%bb%b6%e5%a4%aa%e5%a4%a7%e8%ad%a6%e5%91%8a%e9%98%88%e5%80%bc/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%ae%be%e7%bd%aeemacs%e6%89%93%e5%bc%80%e4%b8%80%e4%b8%aa%e6%96%87%e4%bb%b6%e6%97%b6%e7%9a%84%e6%96%87%e4%bb%b6%e5%a4%aa%e5%a4%a7%e8%ad%a6%e5%91%8a%e9%98%88%e5%80%bc/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 24 Apr 2023 03:56:36 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[emacs]]></category>
		<category><![CDATA[large file warning]]></category>
		<category><![CDATA[文件太大警告]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13891</guid>

					<description><![CDATA[<div>
	每次用emacs打开一个比较大的文件时，它都提示：</div>
<blockquote>
<div>
		File xxx is large (XXXMB), really open? (y or n)</div>
</blockquote>
<div>
	此时必须要按 y 才能继续打开此文件。</div>
<div>
	有没有办法调整这个文件大小的阈值，从而让它不那么烦人？<br />
	<span id="more-13891"></span><br />
	方法是：修改 ~/.emacs 文件，在里面添加如下配置：
<blockquote>
<div>
			;; set large file warning&#160;threshold when opening it</div>
<div>
			(setq large-file-warning-threshold (* 1024 1024 1024))</div>
</blockquote>
<p>	这表示当文件&#62;1GB时才会提示&#8220;文件太大&#8221;的警告，不超过1GB的文件直接就打开了，不需要用户按 y 确认。</p>
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;版权声明&#160;<span style="color: rgb(255, 0, 0);">➤➤</span>&#160;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%ae%be%e7%bd%aeemacs%e6%89%93%e5%bc%80%e4%b8%80%e4%b8%aa%e6%96%87%e4%bb%b6%e6%97%b6%e7%9a%84%e6%96%87%e4%bb%b6%e5%a4%aa%e5%a4%a7%e8%ad%a6%e5%91%8a%e9%98%88%e5%80%bc/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div>
	每次用emacs打开一个比较大的文件时，它都提示：</div>
<blockquote>
<div>
		File xxx is large (XXXMB), really open? (y or n)</div>
</blockquote>
<div>
	此时必须要按 y 才能继续打开此文件。</div>
<div>
	有没有办法调整这个文件大小的阈值，从而让它不那么烦人？<br />
	<span id="more-13891"></span><br />
	方法是：修改 ~/.emacs 文件，在里面添加如下配置：</p>
<blockquote>
<div>
			;; set large file warning&nbsp;threshold when opening it</div>
<div>
			(setq large-file-warning-threshold (* 1024 1024 1024))</div>
</blockquote>
<p>	这表示当文件&gt;1GB时才会提示&ldquo;文件太大&rdquo;的警告，不超过1GB的文件直接就打开了，不需要用户按 y 确认。</p>
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
		感谢关注我的微信公众号（微信扫一扫）：<br />
		<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
		以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e8%ae%be%e7%bd%aeemacs%e6%89%93%e5%bc%80%e4%b8%80%e4%b8%aa%e6%96%87%e4%bb%b6%e6%97%b6%e7%9a%84%e6%96%87%e4%bb%b6%e5%a4%aa%e5%a4%a7%e8%ad%a6%e5%91%8a%e9%98%88%e5%80%bc/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
