<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>本地 &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/tag/%E6%9C%AC%E5%9C%B0/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Mon, 24 Apr 2023 18:09:06 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>[原创] 用JAVA读取本地的TFRecord文件</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Mon, 24 Apr 2023 18:09:06 +0000</pubDate>
				<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[TensorFlow]]></category>
		<category><![CDATA[TFRecord]]></category>
		<category><![CDATA[本地]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13895</guid>

					<description><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&#160;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;&#160;&#160;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&#160;localTfRecordFile&#160;=&#160;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;/path/to/your/tfrecord/file&#34;</span>;
&#160;&#160;&#160;&#160;InputStream&#160;inputStream&#160;=&#160;Files.newInputStream(Paths.get(localTfRecordFile));
&#160;&#160;&#160;&#160;DataInput&#160;dataInput&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;DataInputStream(inputStream);
&#160;&#160;&#160;&#160;TFRecordReader&#160;reader&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;TFRecordReader(dataInput,&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&#160;&#160;&#160;&#160;byte[]&#160;recordBytes&#160;=&#160;reader.read();
&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&#160;(recordBytes&#160;!=&#160;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;Example&#160;example&#160;=&#160;Example.parseFrom(recordBytes);
&#160;&#160;&#160;&#160;&#160;&#160;System.out.println(example.toString());
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&#160;&#160;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&#160;只打印一个Example</span>
&#160;&#160;&#160;&#160;}
&#160;&#160;&#160;&#160;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a></p></div>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/" class="read-more">Read More </a>]]></description>
										<content:encoded><![CDATA[<div>
	TFRecord是一种用于TensorFlow的二进制数据格式，它可以更高效地存储和读取大规模数据集。TFRecord文件包含了一系列记录（record），每个记录可以是一个张量（tensor）或者一个序列（sequence）。</div>
<div>
	与文本文件不同，TFRecord文件被编码成二进制格式，这使得它们更易于在网络上传输和存储。同时，TFRecord也允许我们将大型数据集分割成多个部分，并且可以有效地并行读取和处理这些部分。</div>
<div>
	在TensorFlow中，我们通常使用TFRecord文件来存储和加载模型的训练数据、验证数据、测试数据等。创建TFRecord文件需要经过一定的序列化操作，但这些操作很容易实现，因为TensorFlow提供了相应的API支持。</div>
<p><span id="more-13895"></span><br />
在大数据处理流程中，TFRecord文件通常是由map-reduce&nbsp;job生成的，数据量通常很大。有时为了验证文件内容正确，我们需要取少量数据来检查，例如，我们可以拿map-reduce job生成的N个TFRecord文件中的一个，在本地解析出来，打印出其中的内容看是否正确。<br />
下面就是一个用JAVA程序读取TFRecord文件并打印出其中一个Example的例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="javascript language-javascript hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">String</span>&nbsp;localTfRecordFile&nbsp;=&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;/path/to/your/tfrecord/file&quot;</span>;
&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;inputStream&nbsp;=&nbsp;Files.newInputStream(Paths.get(localTfRecordFile));
&nbsp;&nbsp;&nbsp;&nbsp;DataInput&nbsp;dataInput&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;DataInputStream(inputStream);
&nbsp;&nbsp;&nbsp;&nbsp;TFRecordReader&nbsp;reader&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;TFRecordReader(dataInput,&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">true</span>);

&nbsp;&nbsp;&nbsp;&nbsp;byte[]&nbsp;recordBytes&nbsp;=&nbsp;reader.read();
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&nbsp;(recordBytes&nbsp;!=&nbsp;<span class="hljs-literal" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Example&nbsp;example&nbsp;=&nbsp;Example.parseFrom(recordBytes);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(example.toString());
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&nbsp;只打印一个Example</span>
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;inputStream.close();
</code></pre>
</section>
<p>唯一需要注意的就是一个引入：import java.nio.file.Paths;<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
再详细说明一下：</p>
<div>
	TFRecord文件和Example是TensorFlow中用于数据序列化和存储的两个概念，它们之间有着紧密的关系。</div>
<div>
	TFRecord是一种二进制格式的文件，在TensorFlow中被用来高效地存储大量的数据。它通常是由多个Example组成的序列化数据。而Example则是TensorFlow中序列化数据的标准格式，可以包含多个Features，每个Feature又包含一个Tensor（可以是张量、字符串等）。在将数据写入TFRecord文件时，需要将其封装为Example格式；在读取TFRecord文件时，也需要将其中的每个Example解析出来。</div>
<div>
	简而言之，TFRecord文件就像是一个容器，而Example则是这个容器里面每个元素的具体格式。在使用TFRecord时，我们通常会先定义好我们要存储哪些数据以及这些数据应该怎么被划分为不同的Features，并封装成一个或多个Example，在把这些Example写入到TFRecord文件中。</p>
<p>
		<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
		<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
		转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
		感谢关注我的微信公众号（微信扫一扫）：<br />
		<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
		以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
		<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84tfrecord%e6%96%87%e4%bb%b6/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>[原创] 用JAVA程序读取本地的Hadoop sequence file</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e7%a8%8b%e5%ba%8f%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84hadoop-sequence-file/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e7%a8%8b%e5%ba%8f%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84hadoop-sequence-file/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Thu, 20 Apr 2023 12:23:18 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[Hadoop sequence file]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[本地]]></category>
		<category><![CDATA[读取]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13885</guid>

					<description><![CDATA[<div>
	Hadoop SequenceFile是Hadoop用于存储二进制键值对的文件格式。它支持存储不同的键值对类型,如:IntWritable/Text, NullWritable/BytesWritable等。</div>
<div>
	假设我的sequence file的key是BooleanWritable类型，value是Text类型，怎么读取它呢？<br />
	<span id="more-13885"></span></div>
<p>话不多说，直接上代码：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&#160;&#160;&#160;&#160;Configuration&#160;conf&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;Configuration();
&#160;&#160;&#160;&#160;FileSystem&#160;fs&#160;=&#160;FileSystem.getLocal(conf);
&#160;&#160;&#160;&#160;Path&#160;seqFilePath&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;Path(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;/path/to/your/sequence-file&#34;</span>);

&#160;&#160;&#160;&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&#160;打印出sequence&#160;file的key和value的类型</span>
&#160;&#160;&#160;&#160;&#160;&#160;SequenceFile.Reader&#160;reader&#160;=&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&#160;SequenceFile.Reader(fs,&#160;seqFilePath,&#160;conf);
&#160;&#160;&#160;&#160;&#160;&#160;Writable&#160;key&#160;=&#160;(Writable)&#160;reader.getKeyClass().newInstance();
&#160;&#160;&#160;&#160;&#160;&#160;Writable&#160;value&#160;=&#160;(Writable)&#160;reader.getValueClass().newInstance();
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&#160;(reader.next(key,&#160;value))&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;System.out.println(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;Key&#160;type:&#160;&#34;</span>&#160;+&#160;key.getClass());
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;System.out.println(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#34;Value&#160;type:&#160;&#34;</span>&#160;+&#160;value.getClass());
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;
&#160;&#160;&#160;&#160;&#160;&#160;}
&#160;&#160;&#160;&#160;&#160;&#160;reader.close();
&#160;&#160;&#160;&#160;}

&#160;&#160;&#160;&#160;{
&#160;&#160;&#160;&#160;&#160;&#160;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&#160;打印出sequence&#160;file的内容</span>
&#160;&#160;&#160;&#160;&#160;&#160;SequenceFile.Reader&#160;reader&#160;=&#160;</code></pre>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e7%a8%8b%e5%ba%8f%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84hadoop-sequence-file/" class="read-more">Read More </a></section>]]></description>
										<content:encoded><![CDATA[<div>
	Hadoop SequenceFile是Hadoop用于存储二进制键值对的文件格式。它支持存储不同的键值对类型,如:IntWritable/Text, NullWritable/BytesWritable等。</div>
<div>
	假设我的sequence file的key是BooleanWritable类型，value是Text类型，怎么读取它呢？<br />
	<span id="more-13885"></span></div>
<p>话不多说，直接上代码：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">&nbsp;&nbsp;&nbsp;&nbsp;Configuration&nbsp;conf&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;Configuration();
&nbsp;&nbsp;&nbsp;&nbsp;FileSystem&nbsp;fs&nbsp;=&nbsp;FileSystem.getLocal(conf);
&nbsp;&nbsp;&nbsp;&nbsp;Path&nbsp;seqFilePath&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;Path(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;/path/to/your/sequence-file&quot;</span>);

&nbsp;&nbsp;&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&nbsp;打印出sequence&nbsp;file的key和value的类型</span>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SequenceFile.Reader&nbsp;reader&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;SequenceFile.Reader(fs,&nbsp;seqFilePath,&nbsp;conf);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Writable&nbsp;key&nbsp;=&nbsp;(Writable)&nbsp;reader.getKeyClass().newInstance();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Writable&nbsp;value&nbsp;=&nbsp;(Writable)&nbsp;reader.getValueClass().newInstance();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&nbsp;(reader.next(key,&nbsp;value))&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;Key&nbsp;type:&nbsp;&quot;</span>&nbsp;+&nbsp;key.getClass());
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;Value&nbsp;type:&nbsp;&quot;</span>&nbsp;+&nbsp;value.getClass());
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">break</span>;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;reader.close();
&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">//&nbsp;打印出sequence&nbsp;file的内容</span>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SequenceFile.Reader&nbsp;reader&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;SequenceFile.Reader(fs,&nbsp;seqFilePath),&nbsp;conf);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BooleanWritable&nbsp;keyObj&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;BooleanWritable();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Text&nbsp;valueObj&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">new</span>&nbsp;Text();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">while</span>&nbsp;(reader.next(keyObj,&nbsp;valueObj))&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(keyObj.get()&nbsp;+&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&quot;:&nbsp;&quot;</span>&nbsp;+&nbsp;valueObj);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;reader.close();
&nbsp;&nbsp;&nbsp;&nbsp;}
</code></pre>
</section>
<p>
输出类似于：</p>
<blockquote>
<div>
		Key type: class org.apache.hadoop.io.BooleanWritable</div>
<div>
		Value type: class org.apache.hadoop.io.Text<br />
		（文件内容此处省略）</div>
</blockquote>
<p>Maven依赖取决于你使用的Hadoop版本，下面是一个例子：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="xml language-xml hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependencies</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>org.apache.hadoop<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>hadoop-common<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>3.2.1<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>org.apache.hadoop<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>hadoop-client<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>3.2.1<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>org.apache.hadoop<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">groupId</span>&gt;</span>&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>hadoop-hdfs<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">artifactId</span>&gt;</span>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>3.2.1<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">version</span>&gt;</span>
&nbsp;&nbsp;<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependency</span>&gt;</span>
<span class="hljs-tag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">&lt;/<span class="hljs-name" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">dependencies</span>&gt;</span></code></pre>
</section>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;关注不迷路&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;</p>
<p>
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e7%94%a8java%e7%a8%8b%e5%ba%8f%e8%af%bb%e5%8f%96%e6%9c%ac%e5%9c%b0%e7%9a%84hadoop-sequence-file/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
