A simple walkthrough to get Pig up and running on Centos 7 using an existing Hadoop install. If you have not installed Hadoop please view my post to install Apache Hadoop on Centos 7 before continuing here.

As always, this is as much documentation for me as it is intended to be a tutorial but suggested corrections, additions and omissions are welcomed.

0. Motivation

Since the advent of the Hadoop cluster Hive has been the “path of least resistance” language when getting SQL-savvy ETL engineers or data analysts to use languages that leverage Hadoop’s MapReduce framework. Consequently, Pig is often overlooked as the proper tool for constructing data flows because of the need for resources to understand yet another another programming language.

Alan Gates, Pig architect at Yahoo!, makes a solid case for why a procedural language like Pig is preferable to a declarative one like Hive for data flows.

Furthermore, while I have use /opt here as my installation directory, you’ll notice that none of the commands require sudo permissions. You can do an installation to any directory you have write permissions to in order to use Pig yourself.

1. Download Pig

Go to the Apache Pig download page and click “Download a release now!”. It will then suggest the correct mirror for you to use and take you to the index – select the version of Pig compatible with your version of Hadoop.

For instance, I’ve used:

cd /tmp
wget http://apache.mirrors.hoobly.com/pig/pig-0.16.0/pig-0.16.0.tar.gz

For 3rd party packages I live to use the naming convention `/opt//-' to allow for quick switching between version builds using the environment variables. Therefore I take this step as well:

sudo mkdir /opt/pig
sudo cp pig-0.16.0.tar.gz /opt/pig
cd /opt/pig

Un-gzip and un-tar:

tar zxvf pig-0.16.0.tar.gz

2. Configure

Add the following to the end of your ~/.bashrc:

export PIG_HOME=/opt/pig/pig-0.16.0
export PATH=$PIG_HOME/bin:$PATH
export PIG_CLASSPATH=$HADOOP_HOME/conf

Source it:

. ~/.bashrc

And you’re all set!

pig -version
# Apache Pig version 0.16.0 (r1746530)
# compiled Jun 01 2016, 23:10:49