This repository has been archived by the owner on Jan 29, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 602
Project Layout
Luke Lovett edited this page Oct 7, 2015
·
3 revisions
This page describes the overall structure of the mongo-hadoop project.
The mongo-hadoop provides integrations for a number of different Hadoop frameworks. Each of these integrations are contained in their own module at the root level of the project. These are:
- core - MapReduce integration and abstractions that are reusable by the other modules.
- hive - Apache Hive inegration
- pig - Apache Pig integration
- spark - Apache Spark integration (currently necessary only for PySpark support)
- flume - Apache Flume integration
- streaming - Hadoop Streaming integration
There are also a few directories that don't contain any of the actual MongoDB Hadoop connector source code itself. These are:
- examples - Example code demonstrating how to use the MongoDB Hadoop connector.
- docs - Old documentation. This will probably be deleted soon. Please consult the Github wiki instead (the one you're reading right now!).
- config - The checkstyle and findbugs configuration XML files. Please use these when developing the Hadoop Connector.
-
tools - Contains the
bson_splitter
script (Python script for splitting large BSON files into smaller pieces). This may go away soon. -
clusterConfigs - Contains some of the Hadoop configuration files that are used during the tests. Note that several other configuration files are contained in the
build/resources
directories under certain modules so that they'll be added to the CLASSPATH. - gradle - Gradle scripts used to run tests.