<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>check if an element is present in a bag &#8211; 编码无悔 /  Intent &amp; Focused</title>
	<atom:link href="https://www.codelast.com/tag/check-if-an-element-is-present-in-a-bag/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.codelast.com</link>
	<description>最优化之路</description>
	<lastBuildDate>Sat, 18 Nov 2023 15:11:42 +0000</lastBuildDate>
	<language>zh-Hans</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>[原创] 如何在Apache Pig中判断一个bag中是否包含特定的元素</title>
		<link>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%9c%a8apache-pig%e4%b8%ad%e5%88%a4%e6%96%ad%e4%b8%80%e4%b8%aabag%e4%b8%ad%e6%98%af%e5%90%a6%e5%8c%85%e5%90%ab%e7%89%b9%e5%ae%9a%e7%9a%84%e5%85%83%e7%b4%a0/</link>
					<comments>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%9c%a8apache-pig%e4%b8%ad%e5%88%a4%e6%96%ad%e4%b8%80%e4%b8%aabag%e4%b8%ad%e6%98%af%e5%90%a6%e5%8c%85%e5%90%ab%e7%89%b9%e5%ae%9a%e7%9a%84%e5%85%83%e7%b4%a0/#respond</comments>
		
		<dc:creator><![CDATA[learnhard]]></dc:creator>
		<pubDate>Fri, 05 Aug 2016 08:47:00 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[原创]]></category>
		<category><![CDATA[apache pig]]></category>
		<category><![CDATA[bag包含指定元素]]></category>
		<category><![CDATA[check if an element is present in a bag]]></category>
		<guid isPermaLink="false">http://www.codelast.com/?p=8875</guid>

					<description><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><span style="color:#b22222;">In Pig Latin, how to check if an element is present in a bag?</span></p>
<p>假设一个bag是由 int 元素组成的（可以理解为一个list），那么，如何判断这个bag中是否包含指定的元素（例如 5）呢？<br />
如果你看过Pig的doc，就知道它并没有自带这样一个函数，可以输入一个bag，以及另一个值作为参数，然后输出1或0来表示bag是否包含这个元素。<br />
所以，我们该如何实现这个功能？<br />
<span id="more-8875"></span><br />
现在我就以一个实际的例子来说明这个问题。<br />
假设我们有数据文件 1.txt：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &#34;Helvetica Neue&#34;, Helvetica, &#34;Hiragino Sans GB&#34;, &#34;Microsoft YaHei&#34;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">[codelast@&#160;~]$&#160;cat&#160;1.txt&#160;
a&#160;&#160;&#160;&#160;{(1),(2),(3),(5),(6)}
b&#160;&#160;&#160;&#160;{(1)}
c&#160;&#160;&#160;&#160;{(1),(2)}
d&#160;&#160;&#160;&#160;{(1),(3),(5)}
3&#160;&#160;&#160;&#160;{(1),(2),(5),(6)}
</code></pre>
</section>
<p>一共有两列，用 \t 分隔。其中，第一列是一个字符串，第二列样子很怪，它之所以写成那样，是为了可以在Pig读入的时候直接加载为一个bag，在这里，你可以把第二列理解为一个list，以第一行为例，这个list包含的元素就是1、2、3、5、6。<br />
如果我想判断每一行数据的第二列中，是否包含 5 这个元素，代码该怎么写？<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a>&#8230; <a href="https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%9c%a8apache-pig%e4%b8%ad%e5%88%a4%e6%96%ad%e4%b8%80%e4%b8%aabag%e4%b8%ad%e6%98%af%e5%90%a6%e5%8c%85%e5%90%ab%e7%89%b9%e5%ae%9a%e7%9a%84%e5%85%83%e7%b4%a0/" class="read-more">Read More </a></p>]]></description>
										<content:encoded><![CDATA[<p>查看更多Apache Pig的教程请点击<a href="https://www.codelast.com/?p=4550" rel="noopener" target="_blank"><span style="background-color: rgb(255, 160, 122);">这里</span></a>。</p>
<p><span style="color:#b22222;">In Pig Latin, how to check if an element is present in a bag?</span></p>
<p>假设一个bag是由 int 元素组成的（可以理解为一个list），那么，如何判断这个bag中是否包含指定的元素（例如 5）呢？<br />
如果你看过Pig的doc，就知道它并没有自带这样一个函数，可以输入一个bag，以及另一个值作为参数，然后输出1或0来表示bag是否包含这个元素。<br />
所以，我们该如何实现这个功能？<br />
<span id="more-8875"></span><br />
现在我就以一个实际的例子来说明这个问题。<br />
假设我们有数据文件 1.txt：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">[codelast@&nbsp;~]$&nbsp;cat&nbsp;1.txt&nbsp;
a&nbsp;&nbsp;&nbsp;&nbsp;{(1),(2),(3),(5),(6)}
b&nbsp;&nbsp;&nbsp;&nbsp;{(1)}
c&nbsp;&nbsp;&nbsp;&nbsp;{(1),(2)}
d&nbsp;&nbsp;&nbsp;&nbsp;{(1),(3),(5)}
3&nbsp;&nbsp;&nbsp;&nbsp;{(1),(2),(5),(6)}
</code></pre>
</section>
<p>一共有两列，用 \t 分隔。其中，第一列是一个字符串，第二列样子很怪，它之所以写成那样，是为了可以在Pig读入的时候直接加载为一个bag，在这里，你可以把第二列理解为一个list，以第一行为例，这个list包含的元素就是1、2、3、5、6。<br />
如果我想判断每一行数据的第二列中，是否包含 5 这个元素，代码该怎么写？<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></p>
<ul>
<li>
		<span style="background-color:#00ff00;">方法一</span></li>
</ul>
<p>我们不妨直接来看看正确的实现方法：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">name</span>:&nbsp;chararray,&nbsp;aList:bag{(item:<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>)});
B&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;{
&nbsp;&nbsp;FILTERED_LIST&nbsp;=&nbsp;FILTER&nbsp;aList&nbsp;BY&nbsp;($0&nbsp;==&nbsp;5);
&nbsp;&nbsp;GENERATE
&nbsp;&nbsp;(IsEmpty(FILTERED_LIST)&nbsp;?&nbsp;&#39;not-contain&#39;&nbsp;:&nbsp;&#39;contain&#39;)&nbsp;AS&nbsp;flag,
&nbsp;&nbsp;name;
}
DUMP&nbsp;B;
</code></pre>
</section>
<p>这段Pig代码的输出是：</p>
<blockquote>
<div>
		(contain,a)</div>
<div>
		(not-contain,b)</div>
<div>
		(not-contain,c)</div>
<div>
		(contain,d)</div>
<div>
		(contain,e)</div>
</blockquote>
<p>可以看到，第一行（a）是contain（包含5），第二行（b）是not-contain（不包含5），第三行（c）是not-contain（不包含5），等等。<br />
从输入数据上我们可以很容易地判断出来，这个输出结果是完全正确的。所以，上面的Pig代码是怎么做到的呢？<br />
且听我一行行分析下来。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a><br />
第1行是加载数据，&nbsp;<span style="color:#0000ff;">aList:bag{(item:int)} </span>这个形式有点怪，但它是由我们的输入数据 1.txt 里的格式决定的，这样做我们就能把第二列加载成一个bag，这个bag里有N个tuple，每个tuple里有一个int元素。<br />
第2到第7行是用了一个<a href="http://pig.apache.org/docs/r0.12.0/basic.html#foreach" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">嵌套的FOREACH</span></a>来实现&ldquo;判断一个bag中是否包含指定元素&rdquo;的功能，这一句：</p>
<blockquote>
<p>
		FILTERED_LIST = FILTER aList BY ($0 == 5);</p>
</blockquote>
<p>会先filter得到包含 5 这个元素的bag，然后下面的这一句：</p>
<blockquote>
<p>
		(IsEmpty(FILTERED_LIST) ? &#39;not-contain&#39; : &#39;contain&#39;) AS flag,</p>
</blockquote>
<p>会根根据filter得到的bag是否为空，来输出一个字符串，表示当前数据行是&ldquo;不包含&rdquo;还是&ldquo;包含&rdquo;指定的元素5。<br />
但我觉得有很多人一定有疑问：为什么 FILTER 那一句可以得到包含5的bag？ <span style="color:#0000ff;">$0 == 5</span> 是个什么鬼？其实，这就是查找&ldquo;包含5的bag&rdquo;，千万不要认为 <span style="color:#0000ff;">$0</span>&nbsp;只表示bag的第一个元素！大家可以看看<a href="http://pig.apache.org/docs/r0.12.0/basic.html#foreach" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">这个</span></a>Pig文档，然而它里面也没有对 $0 做具体的解释，大家就强行理解一下吧。<br />
所有又有人会问，那我能不能不用嵌套的FOREACH，直接像下面这样：</p>
<blockquote>
<p>
		B = FILTER A BY (aList.$0 == 5);</p>
</blockquote>
<p>来得到那些包含5的记录呢？答案是不行，这样写语法都是错的，一试便知。<br />
关于方法一，大家也可以参考<a href="http://stackoverflow.com/questions/26390220/check-if-an-element-is-present-in-a-bag" rel="noopener noreferrer" target="_blank"><span style="background-color:#ffa07a;">这个</span></a>stackoverflow的讨论（但它里面其实是有一些错误的）。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></p>
<ul>
<li>
		<span style="background-color:#00ff00;">方法二</span></li>
</ul>
<p>方法一会让第一次接触的同学有些费解。那么方法二就非常直接明了了。思路是：把bag FLATTEN（展开）出来，每一行数据展开成N行，然后第二列就变成了一个标量（int），就可以不用嵌套的FOREACH，也可以FILTER出来包含5的记录：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">name</span>:&nbsp;chararray,&nbsp;aList:bag{(item:<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>)});
B&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;name,&nbsp;FLATTEN(aList)&nbsp;AS&nbsp;item;
C&nbsp;=&nbsp;FILTER&nbsp;B&nbsp;BY&nbsp;(item&nbsp;==&nbsp;5);
DUMP&nbsp;C;
</code></pre>
</section>
<p>输出结果：</p>
<blockquote>
<div>
		(a,5)</div>
<div>
		(d,5)</div>
<div>
		(e,5)</div>
</blockquote>
<p>可见结果是正确的。这里输出的只是bag中包含5的那些记录，如果要找到不包含5的那些记录，可以拿这个输出结果与原始数据做OUTER JOIN，但这显然比方法一麻烦，而且事实上它也不如方法一高效。所以，可以这么说，这算是&ldquo;看上去很容易理解&rdquo;的代价吧。<br />
<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></p>
<ul>
<li>
		<span style="background-color:#00ff00;">方法三</span></li>
</ul>
<p>所以有没有一种方法，它既写起来简单，看起来又容易理解呢？那就是用<span style="color:#0000ff;">UDF</span>啦。但UDF终归还是要自己写的，这个工作量我们没有把它计算在内，所以，这算是方法三的劣势。</p>
<p>假设我们要编写的UDF名为<span style="color:#0000ff;">BagContains</span>，它接受两个参数，第一个参数是bag，第二个参数是我们要在bag中查找的值（例如5），当bag中包含指定元素时返回1，否则返回0。<br />
根据这个定义，我们来看看UDF的用法：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="sql language-sql hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;">REGISTER&nbsp;&#39;/home/codelast/my-pig-lib.jar&#39;;

DEFINE&nbsp;BagContains&nbsp;com.codelast.BagContains();

A&nbsp;=&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">LOAD</span>&nbsp;<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">&#39;1.txt&#39;</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">AS</span>&nbsp;(<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">name</span>:&nbsp;chararray,&nbsp;aList:bag{(item:<span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>)});
B&nbsp;=&nbsp;FOREACH&nbsp;A&nbsp;GENERATE&nbsp;name,&nbsp;((BagContains(aList,&nbsp;5)&nbsp;==&nbsp;1)&nbsp;?&nbsp;&#39;contain&#39;&nbsp;:&nbsp;&#39;not-contain&#39;)&nbsp;AS&nbsp;flag;
DUMP&nbsp;B;
</code></pre>
</section>
<p>其中，REGISTER 那行引入的jar包，是我编写的UDF编译生成的jar包的路径。<br />
这段代码的输出结果：</p>
<blockquote>
<div>
		(a,contain)</div>
<div>
		(b,not-contain)</div>
<div>
		(c,not-contain)</div>
<div>
		(d,contain)</div>
<div>
		(e,contain)</div>
</blockquote>
<div>
	这个结果与方法一实际上是一样的。<br />
	使用了UDF的代码是如此清晰易懂，但是话说回来，这个UDF该怎么写呢？<br />
	<span style="color: rgb(255, 255, 255);">文章来源：</span><a href="http://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">http://www.codelast.com/</span></a></div>
<p>一言不合直接上UDF的代码：</p>
<section class="output_wrapper" id="output_wrapper_id" style="font-size: 16px; color: rgb(62, 62, 62); line-height: 1.6; letter-spacing: 0px; font-family: &quot;Helvetica Neue&quot;, Helvetica, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;">
<pre style="font-size: inherit; color: inherit; line-height: inherit; margin-top: 0px; margin-bottom: 0px; padding: 0px;">
<code class="java language-java hljs" style="margin: 0px 2px; line-height: 18px; font-size: 14px; letter-spacing: 0px; font-family: Consolas, Inconsolata, Courier, monospace; border-radius: 0px; color: rgb(169, 183, 198); background: rgb(40, 43, 46); padding: 0.5em; overflow-wrap: normal !important; word-break: normal !important; overflow: auto !important; display: -webkit-box !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">package</span>&nbsp;com.codelast;

<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">import</span>&nbsp;org.apache.pig.EvalFunc;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">import</span>&nbsp;org.apache.pig.data.DataBag;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">import</span>&nbsp;org.apache.pig.data.DataType;
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">import</span>&nbsp;org.apache.pig.data.Tuple;

<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">import</span>&nbsp;java.io.IOException;

<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">/**
&nbsp;*&nbsp;Check&nbsp;whether&nbsp;a&nbsp;list&nbsp;contains&nbsp;a&nbsp;specified&nbsp;item.
&nbsp;*&nbsp;Usage:&nbsp;suppose&nbsp;aList&nbsp;is&nbsp;a&nbsp;bag&nbsp;contains&nbsp;int&nbsp;items,&nbsp;then
&nbsp;*&nbsp;ContainsItem(aList,&nbsp;5)
&nbsp;*&nbsp;will&nbsp;return&nbsp;1(the&nbsp;list&nbsp;contains&nbsp;5)&nbsp;or&nbsp;0(the&nbsp;list&nbsp;doesn&#39;t&nbsp;contains&nbsp;5)
&nbsp;*
&nbsp;*&nbsp;<span class="hljs-doctag" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;">@author</span>&nbsp;Darran&nbsp;Zhang&nbsp;@&nbsp;codelast.com
&nbsp;*/</span>
<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">public</span>&nbsp;<span class="hljs-class" style="font-size: inherit; color: inherit; line-height: inherit; margin: 0px; padding: 0px; word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">class</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">BagContains</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">extends</span>&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">EvalFunc</span>&lt;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">Integer</span>&gt;&nbsp;</span>{
&nbsp;&nbsp;<span class="hljs-meta" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(91, 218, 237); word-wrap: inherit !important; word-break: inherit !important;">@Override</span>
&nbsp;&nbsp;<span class="hljs-function" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"><span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">public</span>&nbsp;Integer&nbsp;<span class="hljs-title" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">exec</span><span class="hljs-params" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(255, 152, 35); word-wrap: inherit !important; word-break: inherit !important;">(Tuple&nbsp;input)</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; overflow-wrap: inherit !important; word-break: inherit !important;">throws</span>&nbsp;IOException&nbsp;</span>{
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">if</span>&nbsp;(input&nbsp;==&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">null</span>&nbsp;||&nbsp;input.size()&nbsp;==&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">0</span>&nbsp;||&nbsp;input.get(<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">0</span>)&nbsp;==&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">null</span>)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">return</span>&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">null</span>;
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;DataBag&nbsp;inputBag&nbsp;=&nbsp;DataType.toBag(input.get(<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">0</span>));
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>&nbsp;item2Find&nbsp;=&nbsp;DataType.toInteger(input.get(<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">1</span>));
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">for</span>&nbsp;(Tuple&nbsp;entry&nbsp;:&nbsp;inputBag)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">int</span>&nbsp;item&nbsp;=&nbsp;DataType.toInteger(entry.get(<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">0</span>));
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">if</span>&nbsp;(item&nbsp;==&nbsp;item2Find)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">return</span>&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">1</span>;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;">return</span>&nbsp;<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;">0</span>;
&nbsp;&nbsp;}
}
</code></pre>
</section>
<p>在这里，我就不解释每一行代码了，Pig UDF（JAVA）的编写有一个比较类似的套路，很多简单的UDF都可以用上面的格式稍微改改得到。</p>
<p><span style="color: rgb(255, 255, 255);">文章来源：</span><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><span style="color: rgb(255, 255, 255);">https://www.codelast.com/</span></a><br />
<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;版权声明&nbsp;<span style="color: rgb(255, 0, 0);">➤➤</span>&nbsp;<br />
转载需注明出处：<u><a href="https://www.codelast.com/" rel="noopener noreferrer" target="_blank"><em><span style="color: rgb(0, 0, 255);"><strong style="font-size: 16px;"><span style="font-family: arial, helvetica, sans-serif;">codelast.com</span></strong></span></em></a></u>&nbsp;<br />
感谢关注我的微信公众号（微信扫一扫）：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="wechat qrcode of codelast" src="https://www.codelast.com/codelast_wechat_qr_code.jpg" style="width: 200px; height: 200px;" /><br />
	以及我的微信视频号：</p>
<p style="border: 0px; font-size: 13px; margin: 0px 0px 9px; outline: 0px; padding: 0px; color: rgb(77, 77, 77);">
	<img decoding="async" alt="" src="https://www.codelast.com/wechat_shipinhao_qr_code.jpg" style="text-align: center; width: 200px; height: 199px;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.codelast.com/%e5%8e%9f%e5%88%9b-%e5%a6%82%e4%bd%95%e5%9c%a8apache-pig%e4%b8%ad%e5%88%a4%e6%96%ad%e4%b8%80%e4%b8%aabag%e4%b8%ad%e6%98%af%e5%90%a6%e5%8c%85%e5%90%ab%e7%89%b9%e5%ae%9a%e7%9a%84%e5%85%83%e7%b4%a0/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
