learning-pyspark-banner-final

Jump Start into Python and Apache Spark with Learning PySpark

For the last few years, I have had the opportunity to work on some of the coolest Apache Spark committers, contributors, and projects.  As luck would have it, I got the opportunity to meet my co-author Tomasz Drabas (author of the awesome Practical Data Analysis Cookbook) while we were solving some other cool Apache Spark projects.  In the process, we joined forces to share our lessons learned that will hopefully help you jump start your Python and Apache Spark projects with our book: Learning PySpark. And just to make sure, this book was reviewed by the incomparable Holden Karau, author of the…

Rate this:

NASA image acquired April 18 - October 23, 2012

This image of the United States of America at night is a composite assembled from data acquired by the Suomi NPP satellite in April and October 2012. The image was made possible by the new satellite’s “day-night band” of the Visible Infrared Imaging Radiometer Suite (VIIRS), which detects light in a range of wavelengths from green to near-infrared and uses filtering techniques to observe dim signals such as city lights, gas flares, auroras, wildfires, and reflected moonlight.

“Nighttime light is the most interesting data that I’ve had a chance to work with,” says Chris Elvidge, who leads the Earth Observation Group at NOAA’s National Geophysical Data Center. “I’m always amazed at what city light images show us about human activity.” His research group has been approached by scientists seeking to model the distribution of carbon dioxide emissions from fossil fuels and to monitor the activity of commercial fishing fleets. Biologists have examined how urban growth has fragmented animal habitat. Elvidge even learned once of a study of dictatorships in various parts of the world and how nighttime lights had a tendency to expand in the dictator’s hometown or province.

Named for satellite meteorology pioneer Verner Suomi, NPP flies over any given point on Earth's surface twice each day at roughly 1:30 a.m. and p.m. The polar-orbiting satellite flies 824 kilometers (512 miles) above the surface, sending its data once per orbit to a ground station in Svalbard, Norway, and continuously to local direct broadcast users distributed around the world. Suomi NPP is managed by NASA with operational support from NOAA and its Joint Polar Satellite System, which manages the satellite's ground system.

NASA Earth Observatory image by Robert Simmon, using Suomi NPP VIIRS data provided courtesy of Chris Elvidge (NOAA National Geophysical Data Center). Suomi NPP is the result of a partnership between NASA, NOAA, and t

On-Time Flight Performance with GraphFrames for Apache Spark

Feature Image: NASA Goddard Space Flight Center: City Lights of the United States 2012 This is an abridged version of the full blog post On-Time Flight Performance with GraphFrames. You can also reference the webinar GraphFrames: DataFrame-based graphs for Apache Spark and the On-Time Flight Performance with GraphFrames for Apache Spark notebook. An intuitive approach to understanding flight departure delays is to use graph structures. Why Graph? The reason for using graph structures is because it is a more intuitive approach to many classes of data problems: social networks, restaurant recommendations, or flight paths.  It is easier to understand these data problems…

Rate this:

img_0841_1024x768

Notebook Gallery

Here are some of the notebooks created to showcase various Apache Spark use cases. These are all using Databricks Community Edition which you can get at Try Databricks. You can also access the source from : https://github.com/dennyglee/databricks. JSON Support GLM in SparkR Window Functions  Random Forests DataFrame API ML Operations   Decision Trees Statistical Functions  Data Import  Data Exploration Quick Start Python Quick Start Scala  Ad-Tech Example Flight Delays  Genomics Mobile Sample   Pop vs. Price LR  Pop vs. Price DF  Salesforce Leads Spark 1.6 (Multiple)   Spark 1.6  

Rate this:

10142662113_8d96179bed_1024x682

In the context of quantum entanglement and time travel – Stargate may be more correct than Star Trek

Feature Image: Michael Bolognesi’s Diamonds in the Sky As a follow up to In the context of quantum entanglement and teleportation – Stargate may be more correct than Star Trek, I’m diving into one of SciFi’s persistent quandaries – time travel.  And before anyone gets started, I am a proud Trekkie so this is not meant as a knock on Star Trek.  In fact, I’ve already purchased my tickets for Star Trek Into Darkness and as fan of BBC’s Sherlock, I have to admit I’m sort of rooting for the villain this time around!   Image source: Benedict Cumberbatch – Star…

Rate this:

8206951707_d11347634a_b

Using Avro with HDInsight on Azure at 343 Industries

By Michael Wetzel, Tamir Melamed, Mark Vayman, Denny Lee Reviewed by Pedro Urbina Escos, Brad Sarsfield, Rui Martins Thanks to Krishnan Kaniappan, Che Chou, Jennifer Yi, and Rob Semsey As noted in the Windows Azure Customer Solution Case Study, Halo 4 developer 343 Industries Gets New User Insights from Big Data in the Cloud, a critical component to achieve faster Hadoop query and processing performance AND keep file sizes small (thus Azure storage savings, faster query performance, and reduced network overhead) was to utilize Avro sequence files. Avro was designed for Hadoop to help make Hadoop more interoperable with other…

Rate this:

IMG_0054.jpg

An all too brief stop over in Tainan (台南)

For anyone who regularly visits Taiwan, the city of Tainan can easily be missed with the other three major cites of Taipei, Taichung, and Kaohsiung.  Yet that would be a grave mistake if you consider yourself a foodie.  A word of warning, driving in Tainan is atrocious – there are only two major roads going into the city off of Highway 1.  Yet if you brave the driving conditions and/or decide take the train (HSR or TRA), make your way to the old historic district and you will be pleasantly satiated with all sorts of Taiwanese small eats (台灣小吃).  This…

Rate this:

img_0188_1024x768

Travel Tuesday: Sagrada Famila

For this Travel Tuesday post – allow me to share some personal photos from the beautiful Basilica I Temple Expiatori de la Sagrada Familia in Barcelona, Catalonia, Spain.  This is one of Antoni Gaudi’s most amazing and famous works merging Gothic, Catholicism, and  Art Nouveau.  For more information, check out wikipedia: http://en.wikipedia.org/wiki/Sagrada_Fam%C3%ADlia Though it is an incomplete work of art that will take decades to complete, it is well worth visiting and touring this UNESCO site.  Important biblical events are depicted all throughout on the exterior walls of the basilica. It is a testament to Gaudi’s vision to have drawn…

Rate this: