Pig 执行 - 菜鸟教程

执行

在上一章中，我们解释了如何安装Apache Pig。在本章中，我们将讨论如何执行Apache Pig。

Apache Pig执行模式

您可以在两种模式下运行Apache Pig，即本地模式和HDFS模式。

本地模式

在这种模式下，所有文件都从本地主机和本地文件系统安装并运行。无需Hadoop或HDFS。此模式通常用于测试目的。

MapReduce模式

MapReduce模式是我们使用Apache Pig加载或处理Hadoop文件系统（HDFS）中存在的数据的地方。在这种模式下，每当我们执行Pig Latin语句来处理数据时，都会在后端调用MapReduce作业以对HDFS中存在的数据执行特定操作。

Apache Pig执行机制

Apache Pig脚本可以通过三种方式执行，即交互方式，批处理方式和嵌入式方式。

交互模式（Grunt shell） - 您可以使用Grunt shell在交互模式下运行Apache Pig。在此外壳程序中，您可以输入Pig Latin语句并获取输出（使用Dump运算符）。
批处理模式（脚本） - 您可以通过在单个扩展名为.pig的文件中编写Pig Latin脚本，以批处理模式运行Apache Pig 。
嵌入模式（UDF） - Apache Pig提供了定义我们自己的函数（的规定USER Defined Functions）在诸如Java编程语言，并在我们的脚本中使用它们。

调用Grunt Shell

您可以使用-x选项以所需的方式（本地/MapReduce）调用Grunt shell，如下所示。

本地模式 MapReduce 模式

命令 − $ ./pig –x local 命令$ ./pig -x mapreduce

本地模式	MapReduce 模式
命令 − $ ./pig –x local	命令$ ./pig -x mapreduce
jc2182@debian:~/pig$ pig -x local 2021-01-11 15:09:08,420 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType 2021-01-11 15:09:08,825 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58 2021-01-11 15:09:08,825 [main] INFO org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610348948819.log 2021-01-11 15:09:08,939 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found 2021-01-11 15:09:09,131 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2021-01-11 15:09:09,133 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2021-01-11 15:09:09,537 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2021-01-11 15:09:09,618 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-47ff99a2-5aab-497f-9966-0ffebd44f115 2021-01-11 15:09:09,618 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false grunt>	jc2182@debian:~/pig$ pig -x mapreduce 2021-01-11 15:11:00,724 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2021-01-11 15:11:00,816 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58 2021-01-11 15:11:00,816 [main] INFO org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610349060803.log 2021-01-11 15:11:00,856 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found 2021-01-11 15:11:01,150 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2021-01-11 15:11:01,179 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2021-01-11 15:11:01,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2021-01-11 15:11:02,161 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-424e89f8-dc16-49eb-89dc-5584cb0c47f7 2021-01-11 15:11:02,161 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false grunt>

      
jc2182@debian:~/pig$ pig -x local
2021-01-11 15:09:08,420 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2021-01-11 15:09:08,825 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-01-11 15:09:08,825 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610348948819.log
2021-01-11 15:09:08,939 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found
2021-01-11 15:09:09,131 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 15:09:09,133 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2021-01-11 15:09:09,537 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-01-11 15:09:09,618 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-47ff99a2-5aab-497f-9966-0ffebd44f115
2021-01-11 15:09:09,618 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt>

 
jc2182@debian:~/pig$ pig -x mapreduce
2021-01-11 15:11:00,724 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2021-01-11 15:11:00,816 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-01-11 15:11:00,816 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610349060803.log
2021-01-11 15:11:00,856 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found
2021-01-11 15:11:01,150 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-01-11 15:11:01,179 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 15:11:01,179 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2021-01-11 15:11:02,161 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-424e89f8-dc16-49eb-89dc-5584cb0c47f7
2021-01-11 15:11:02,161 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt>

注意：MapReduce模式必需启动Hadoop

这两个命令都会为您提供Grunt shell提示符，如下所示。

 
grunt>

您可以使用'ctrl + d'退出Grunt shell。

调用Grunt shell之后，您可以通过直接在其中输入Pig Latin语句来执行Pig脚本。

 
grunt> customers = LOAD 'customers.txt' USING PigStorage(',');

以批处理模式执行Apache Pig

您可以在文件中编写整个Pig Latin脚本，然后使用–x命令执行它。我们假设在名为sample_script.pig的文件中有一个Pig脚本，如下所示。

脚本文件 Sample_script.pig

 
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);
  
Dump student;

现在，您可以在上述文件中执行脚本，如下所示。

本地模式 $ pig -x local Sample_script.pig
mapreduce 模式 $ pig -x mapreduce Sample_script.pig

注意-我们将在后续章节中详细讨论如何在Bach模式和嵌入式模式下运行Pig脚本。