DownloadThe Portobello Bookshop Gift Guide 2024

Frank Kane's Taming Big Data with Apache Spark and Python

Learn Big Data processing with hands-on Spark tutorials

Frank Kane author

Format:Paperback

Publisher:Packt Publishing Limited

Published:30th Jun '17

Currently unavailable, and unfortunately no date known when it will be back

Frank Kane's Taming Big Data with Apache Spark and Python cover

This book offers a practical guide to mastering Apache Spark and big data processing using Python, featuring hands-on tutorials and real-world examples.

In Frank Kane's Taming Big Data with Apache Spark and Python, readers are guided through the intricacies of Apache Spark, a powerful tool for big data processing. This book is designed to help data scientists and analysts understand how to leverage Spark for analyzing large datasets, whether on a single machine or across a computing cluster. With a focus on hands-on learning, Frank Kane presents over 15 real-world examples that illustrate the practical applications of Spark in various scenarios.

The book begins by introducing the fundamentals of Spark, including installation and setup on both local systems and clusters. Readers will learn how to identify big data challenges as Spark problems and how to utilize Spark's Resilient Distributed Datasets (RDD) for efficient data processing. Additionally, the book covers advanced topics such as machine learning with Spark's MLlib library, real-time data processing using Spark Streaming, and complex network analysis through the GraphX library.

Frank Kane's Taming Big Data with Apache Spark and Python is perfect for those with some programming experience in Python who wish to delve deeper into big data processing. With a step-by-step approach, readers can progress at their own pace, making it an accessible resource for anyone looking to master the Spark ecosystem and implement real-time Spark projects effectively.

ISBN: 9781787287945

Dimensions: unknown

Weight: unknown

296 pages