Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills

"O'Reilly Media, Inc.", 12 juin 2017 - 280 pages

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming.

You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance.

If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications.

With this book, you will:

Familiarize yourself with the Spark programming model
Become comfortable within the Spark ecosystem
Learn general approaches in data science
Examine complete implementations that analyze large public data sets
Discover which machine learning tools make sense for particular problems
Acquire code that can be adapted to many uses

Aperçu du livre »

Pages sélectionnées

Index

Table des matières

Section 1

Section 2

Section 3

Section 4

Section 5

Section 6

Section 7

Section 8

Section 11

Section 12

Section 13

Section 14

Section 15

Section 16

Section 17

Section 18

Section 9

Section 10

Section 19

Section 20

Autres éditions - Tout afficher

Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Sandy Ryza,Uri Laserson,Sean Owen,Josh Wills
Aperçu limité - 2017

Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Sandy Ryza,Uri Laserson,Sean Owen,Josh Wills
Aucun aperçu disponible - 2017

Expressions et termes fréquents

analysis Analyzing Apache Spark Average Path Length Avro big data cache categorical features chapter client clustering coefficient Co-Occurrence Networks column compute concept contains count create data frame data points data science data scientists data set data type DataFrame DataFrame API decision tree defined disambiguation distribution document document-term matrix encoding entropy Esri evaluate example factors filter format function genomics GeoJSON geospatial graph hadoop hadoop fs HDFS hyperparameters implementation import input Invalid Records iteration Java JSON K-means label Lemmatization libraries machine learning Map[VertexId MapReduce MEDLINE method metric normal NumPy object pairs parse perform pipeline prediction Pregel PySpark Python query random recommendations regression result sample Scala schema score simulation small-world networks Spark MLlib Spark shell Spark SQL Storage String subset taxi Thunder topic transform trip variables vertex VertexId vertices

À propos de l'auteur (2017)

Sandy Ryza develops algorithms for public transit at Remix. Prior, he was a senior data scientist at Cloudera and Clover Health. He is an Apache Spark committer, Apache Hadoop PMC member, and founder of the Time Series for Spark project. He holds the Brown University computer science department's 2012 Twining award for "Most Chill".

Uri Laserson is an Assistant Professor of Genetics at the Icahn School of Medicine at Mount Sinai, where he develops scalable technology for genomics and immunology using the Hadoop ecosystem.

Sean Owen is Director of Data Science at Cloudera. He is an ApacheSpark committer and PMC member, and was an Apache Mahout committer.

Josh Wills is the Head of Data Engineering at Slack, the founder of the Apache Crunch project, and wrote a tweet about data scientists once.

Informations bibliographiques

Titre	Advanced Analytics with Spark: Patterns for Learning from Data at Scale
Auteurs	Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
Édition	2
Éditeur	"O'Reilly Media, Inc.", 2017
ISBN	1491972904, 9781491972908
Longueur	280 pages

Exporter la citation	BiBTeX EndNote RefMan

À propos de Google Livres - Règles de confidentialité - Conditions d' utilisation - Informations destinées aux éditeurs - Signaler un problème - Aide - Accueil Google