- The Google File System
- MapReduce: Simplified Data Processing on Large Clusters
- MapReduce: A major step backwards
- From Google File System to Omega: A Decade of Advancement in Big Data Management at Google
- Bigtable: A Distributed Storage System for Structured Data
- Spark - Cluster Computing with Working Sets
- The Hadoop Distributed File System
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Big Data: A Survey
- Cassandra: a decentralized structured storage system
- Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
- Apache Hadoop YARN: yet another resource negotiator
- Large-scale cluster management at Google with Borg
- Apache Hive: A SQL Engine for the Apache Hadoop Framework, and More
- Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
- The impact of columnar file formats on SQL-on-hadoop engine performance: A study on ORC and Parquet
- TensorFlow: A System for Large-Scale Machine Learning
- Photon: A Fast Query Engine for Lakehouse Systems
- Apache Spark as a Compiler: Joining a Billion Rows Per Second on a Laptop