Skip to content

Latest commit

 

History

History
88 lines (65 loc) · 3.12 KB

README.md

File metadata and controls

88 lines (65 loc) · 3.12 KB

Source Code for PySpark Algorithms Book

Unlock the Power of Big Data by PySpark Algorithms book


PySpark Algorithms Book:

Author: Mahmoud Parsian ([email protected])

Publication date: August 2019


About PySpark Algorithms Book

  • This book is about PySpark (Python API for Spark)
  • Introductory book on how to solve data problems using PySpark
  • Learn how to use mappers, filters, and reducers
  • Learn how to partition data for fast queries
  • Learn how to use the mapPartitions() transformation
  • Learn how to use reduceByKey(), groupByKey(), and combineByKey() transformations
  • Learn how to use Spark's transformations and actions for solving real problems
  • Learn how to use RDDs and DataFrames
  • Learn how to read/write data from many data sources
  • Learn how to use Logistic regression
  • Learn how to use Spark's reduction transformations
  • Learn how to use GraphFrames
  • Learn how to use Motifs in GraphFrames
  • Learn how to use Monoids in MapReduce algorithms

PySpark Algorithms Book


Software


Table of Contents

chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
chap04: Getting Started -- Sample Chapter
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids

Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
Appendix C: Questions And Answers (50+ QA)


Future chapters:

chap13: FP-Growth
chap14: LDA
chap15: Linear Regression


PySpark Algorithms Book