<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MultiStorage &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/tag/multistorage/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Wed, 15 Nov 2023 08:01:16 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>[原创] Apache Pig如何按数据分组保存到不同的子目录中(MultiStorage)</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-apache-pig%e5%a6%82%e4%bd%95%e6%8c%89%e6%95%b0%e6%8d%ae%e5%88%86%e7%bb%84%e4%bf%9d%e5%ad%98%e5%88%b0%e4%b8%8d%e5%90%8c%e7%9a%84%e5%ad%90%e7%9b%ae%e5%bd%95%e4%b8%admultistorage/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-apache-pig%e5%a6%82%e4%bd%95%e6%8c%89%e6%95%b0%e6%8d%ae%e5%88%86%e7%bb%84%e4%bf%9d%e5%ad%98%e5%88%b0%e4%b8%8d%e5%90%8c%e7%9a%84%e5%ad%90%e7%9b%ae%e5%bd%95%e4%b8%admultistorage/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Sun, 06 Nov 2022 06:30:06 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[综合]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[MultiStorage]]></category>
		<category><![CDATA[多目录]]></category>
		<guid isPermaLink="false">https://www.codelast.com/?p=13628</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig进行数据处理的时候，我们通常会在最后把处理结果保存到一个HDFS目录下：</p>
<blockquote>
<p>
		STORE result INTO &#39;/my_output_dir&#39;;</p>
</blockquote>
<p>这是最常见的情况。<br />
但是，如果我们想根据某个字段，把数据分成多组，分别存储在多个目录下呢？举个可能不恰当的例子，就有点像我们先把数据按某个字段分组：</p>
<blockquote>
<p>
		GROUP data BY field;</p>
</blockquote>
<p>再把各个group的数据分别存储在不同的目录下一样。<br />
<span id="more-13628"></span><br />
现在来看一个实例。<br />
假设有如下数据（用 tab&#160;分隔的四列分别为：人员类型的id，人员类型的描述，姓名，爱好）：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<table style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; width: 100%;">
<thead style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px;">
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				type</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				desc</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				name</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				hobby</th>
</tr>
</thead>
<tbody style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border: 0px;">
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				陈玉</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				篮球</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				3</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				老师</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				王强</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				足球</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				1</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				保安</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				许勤</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				下棋</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				范雨</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				跑步</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				李林</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				游泳</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				1</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				保安</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				涂欣</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				看书</td>
</tr>
</tbody>
</table>
</section>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-apache-pig%e5%a6%82%e4%bd%95%e6%8c%89%e6%95%b0%e6%8d%ae%e5%88%86%e7%bb%84%e4%bf%9d%e5%ad%98%e5%88%b0%e4%b8%8d%e5%90%8c%e7%9a%84%e5%ad%90%e7%9b%ae%e5%bd%95%e4%b8%admultistorage/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p>用Apache Pig进行数据处理的时候，我们通常会在最后把处理结果保存到一个HDFS目录下：</p>
<blockquote>
<p>
		STORE result INTO &#39;/my_output_dir&#39;;</p>
</blockquote>
<p>这是最常见的情况。<br />
但是，如果我们想根据某个字段，把数据分成多组，分别存储在多个目录下呢？举个可能不恰当的例子，就有点像我们先把数据按某个字段分组：</p>
<blockquote>
<p>
		GROUP data BY field;</p>
</blockquote>
<p>再把各个group的数据分别存储在不同的目录下一样。<br />
<span id="more-13628"></span><br />
现在来看一个实例。<br />
假设有如下数据（用 tab&nbsp;分隔的四列分别为：人员类型的id，人员类型的描述，姓名，爱好）：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<table style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; width: 100%;">
<thead style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px;">
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				type</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				desc</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				name</th>
<th style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em; text-align: left; background-color: rgb(240, 240, 240);">
				hobby</th>
</tr>
</thead>
<tbody style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border: 0px;">
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				陈玉</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				篮球</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				3</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				老师</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				王强</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				足球</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				1</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				保安</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				许勤</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				下棋</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				范雨</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				跑步</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				2</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				学生</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				李林</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				游泳</td>
</tr>
<tr style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; border-width: 1px 0px 0px; border-right-style: initial; border-bottom-style: initial; border-left-style: initial; border-right-color: initial; border-bottom-color: initial; border-left-color: initial; border-image: initial; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: white;">
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				1</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				保安</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				涂欣</td>
<td style="color: inherit; line-height: inherit; margin: 0px; font-size: 1em; border-style: solid; border-color: rgb(204, 204, 204); padding: 0.5em 1em;">
				看书</td>
</tr>
</tbody>
</table>
</section>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
现在我们想按第1列 type，把这份数据处理之后分别存储到不同的目录下，即 type 2（学生）的数据保存到目录&quot;2&quot;里，type 3（老师）的数据保存到目录&quot;3&quot;里，依此类推。<br />
一个最简单也最笨的方法是： 在Pig中把数据 SPLIT(拆分) 成多份，再分别 STORE(存储) 到指定的目录下：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">type</span>:&nbsp;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>,&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">desc</span>:&nbsp;chararray,&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">name</span>:&nbsp;chararray,&nbsp;hobby:&nbsp;chararray);
SPLIT&nbsp;A&nbsp;INTO&nbsp;A1&nbsp;IF&nbsp;(type&nbsp;==&nbsp;1),&nbsp;A2&nbsp;IF&nbsp;(type&nbsp;==&nbsp;2),&nbsp;A3&nbsp;IF&nbsp;(type&nbsp;==&nbsp;3);

STORE&nbsp;A1&nbsp;INTO&nbsp;&#39;/my_output_dir/1&#39;;
STORE&nbsp;A2&nbsp;INTO&nbsp;&#39;/my_output_dir/2&#39;;
STORE&nbsp;A3&nbsp;INTO&nbsp;&#39;/my_output_dir/3&#39;;
</code></pre>
</section>
<p>
代码很直观，但当 type 有太多种数值的时候，比如说type有100种数值，难道你打算写100个 STORE 语句吗？会疯。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
更好的办法是利用piggybank里的一个UDF <a href="https://pig.apache.org/docs/latest/api/org/apache/pig/piggybank/storage/MultiStorage.html" rel="noopener" target="_blank">MultiStorage</a>来实现同样的功能：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">type</span>:&nbsp;<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>,&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">desc</span>:&nbsp;chararray,&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">name</span>:&nbsp;chararray,&nbsp;hobby:&nbsp;chararray);
STORE&nbsp;A&nbsp;INTO&nbsp;&#39;/my_output_dir&#39;&nbsp;USING&nbsp;org.apache.pig.piggybank.storage.MultiStorage(&#39;/my_output_dir&#39;,&nbsp;&#39;0&#39;,&nbsp;&#39;none&#39;,&nbsp;&#39;\t&#39;,&nbsp;&#39;true&#39;);
</code></pre>
</section>
<p>看，只要一行代码就代替了无数的 SPLIT INTO&nbsp;以及 STORE INTO&nbsp;语句，多么优雅。<br />
MultiStorage的参数含义分别是：</p>
<blockquote>
<p>
		public MultiStorage(String <span style="color:#0000ff;">parentPathStr</span>, String <span style="color:#0000ff;">splitFieldIndex</span>, String <span style="color:#0000ff;">compression</span>, String <span style="color:#0000ff;">fieldDel</span>, String <span style="color:#0000ff;">isRemoveKeys</span>)</p>
</blockquote>
<p>parentPathStr：输出文件的父目录。事实上这个参数不起作用，真正起作用的输出路径是在 &quot;INTO&quot;&nbsp;关键字后面的那个路径里指定的，但开发团队为了向前兼容，保留了这个参数，因此，把这个参数里的路径，设置成和 &quot;INTO&quot;&nbsp;后面的路径一样即可。<br />
splitFieldIndex：使用数据里的第几个字段来拆分数据存储到不同的子目录下（从0开始）。我这里设置成0，是表示我使用数据里的 type&nbsp;字段来拆分数据。<br />
compression：输出数据的压缩方式，可以是&nbsp;&#39;bz2&#39;，&#39;bz&#39;，&#39;gz&#39;，&#39;none&#39;这几种取值，其中 none&nbsp;表示无压缩。<br />
fieldDel：输出数据多个字段的分隔符。<br />
isRemoveKeys：输出数据是否移除掉用于拆分数据的字段。我这里设置成 true&nbsp;表示在输出数据中不会包含 type，因为我只想拿 type&nbsp;用作输出子目录名，不想让它出现在输出数据中。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
最后我们看一下执行结果。生成了3个目录，每个 type&nbsp;一个目录：</p>
<blockquote>
<p>
		/my_output_dir/1<br />
		/my_output_dir/2<br />
		/my_output_dir/3<br />
		/my_output_dir/_SUCCESS</p>
</blockquote>
<p>再来看一下子目录&quot;1&quot;里的文件：</p>
<blockquote>
<p>
		/my_output_dir/1/1-0,000</p>
</blockquote>
<p>其文件内容是：</p>
<blockquote>
<p>
		保安&nbsp; &nbsp; 许勤&nbsp; &nbsp; 下棋<br />
		保安&nbsp; &nbsp; 涂欣&nbsp; &nbsp; 看书</p>
</blockquote>
<p>这里面全是 type 1&nbsp;的数据。可见 MultiStorage 确实把数据按照 type&nbsp;拆分并存储在了不同的子目录下。</p>
<p>
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
	<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
	转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
	感谢关注我的微信公众号（微信扫一扫）：<br />
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="color: rgb(77, 77, 77); font-size: 13px; width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-apache-pig%e5%a6%82%e4%bd%95%e6%8c%89%e6%95%b0%e6%8d%ae%e5%88%86%e7%bb%84%e4%bf%9d%e5%ad%98%e5%88%b0%e4%b8%8d%e5%90%8c%e7%9a%84%e5%ad%90%e7%9b%ae%e5%bd%95%e4%b8%admultistorage/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
