Big Data Reading List

The Google File System
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: A major step backwards
From Google File System to Omega: A Decade of Advancement in Big Data Management at Google
Bigtable: A Distributed Storage System for Structured Data
Spark - Cluster Computing with Working Sets
The Hadoop Distributed File System
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Big Data: A Survey
Cassandra: a decentralized structured storage system
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
Apache Hadoop YARN: yet another resource negotiator
Large-scale cluster management at Google with Borg
Apache Hive: A SQL Engine for the Apache Hadoop Framework, and More
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
The impact of columnar file formats on SQL-on-hadoop engine performance: A study on ORC and Parquet
TensorFlow: A System for Large-Scale Machine Learning
Photon: A Fast Query Engine for Lakehouse Systems
Apache Spark as a Compiler: Joining a Billion Rows Per Second on a Laptop

Provide feedback