Apache Pig


Why the name Pig??

Pigs have a reputation that they eat anything that they come across, are able to live anywhere, are easily domesticated and can “fly”

How do these characteristics apply to PIG?

  • PIG can handle any data – Structured, Semi-structured, Unstructured (It can eat anything)
  • PIG is not committed to any particular framework including Hadoop (It can live anywhere)
  • PIG is very good on large blocks of data (PIG flies with data)
  • PIG is very easy to code and maintain (Can be domesticated easily)

Why Apache PIG?

This lead the programmers at Yahoo to come up with a language that had two essential characteristics:
A declarative query based language inspired by SQL
A low level programming language that could generate MapReduce code
Does all the heavy weight Big Data tasks
Hence , that is how PIG was born



What is Apache PIG?

It is an open source data flow language. Pig Latin is used to express the queries and data manipulation operations in simple scripts. It converts the scripts into a sequence of underlying Map Reduce jobs  that run on Hadoop cluster. PIG is a platform for analysing large data sets in a rapid fashion. It runs as client’s side application ,Can perform a sample run on a subset of input data – used for testing

Where is PIG in Hadoop Ecosystem?


Need For Apache PIG

We Need PIG Because Java is not a preferred language for many data analysts,200 Java LOC ~ 10 PIG LOC (Line of code),Many built-in operations are available for common data operations like join, grouping, filtering etc .Less development time as compared to MapReduce (Higher Level of Abstraction), Can perform ad hoc MapReduce jobs on very large data sets, Java knowledge is optional, Fewer Lines Of Code = Easier ,Development & Maintenance, Easily extensible , Easy to Learn and user friendly.
PIG at Yahoo








Leave a Reply

Your email address will not be published. Required fields are marked *