These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016. — Apache Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download. You can view the on-demand webinar Jump Start into Apache® Spark™…
Data Exploration with Databricks
Today, it was also featured on InsideBigData: Data Exploration with Databricks. Awesome! This Data Exploration on Databricks jump start video will show you how go from data source to visualization in a few easy steps. Specifically, we will take semi-structured logs, easily extract and transform them, analyze and visualize the data using Spark SQL, so we can quickly understand our data. For more information and to check out other Spark notebooks, check out Selected Notebooks > Databricks Jump Start.
Simplify Machine Learning on Spark with Databricks
As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the supporting infrastructure. Key issues include on the ability to easily visualize, share, deploy, and schedule jobs. More disconcerting is the need for data engineers to re-implement the models developed by data scientists for production. With Databricks, data scientists and engineers can simplify these logistical issues and spend more of their time focusing on their data problems. Simplify Visualization An important perspective for data scientists and engineers is the ability to quickly visualize the data and the model…
Here are some of the notebooks created to showcase various Apache Spark use cases. These are all using Databricks Community Edition which you can get at Try Databricks. You can also access the source from : https://github.com/dennyglee/databricks. JSON Support GLM in SparkR Window Functions Random Forests DataFrame API ML Operations Decision Trees Statistical Functions Data Import Data Exploration Quick Start Python Quick Start Scala Ad-Tech Example Flight Delays Genomics Mobile Sample Pop vs. Price LR Pop vs. Price DF Salesforce Leads Spark 1.6 (Multiple) Spark 1.6