One of the good (or bad depending on your point of view) habits when working with Hadoop is that you can push your files into the Hadoop cluster and worry about making sense of this data at a later time. One of the many issues with this approach is that you may rapidly run out of disk space on your cluster or your cloud storage. A good way to alleviate this issue (outside of deleting the data) is to compress the data within HDFS. More information on how the script works are embedded within the comments. /* ** Pig Script:…