Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

Usage with Authentication

Luke Lovett edited this page May 27, 2015 · 6 revisions

This page will outline how to configure the MongoDB Hadoop Connector to authenticate to a MongoDB cluster as well as provide a few examples.

Configure MongoDB

The Hadoop connector may need to do the following, depending on your configuration:

  • Run the splitVector command (only when the input is not sharded).
  • Read the config.shards collection (only when the input is sharded).
  • Run the collStats command (only when the input is not sharded but still behind a mongos).
  • Read from input collections, including reading directly from shards, in the case of MongoShardSplitter (whenever the input is from MongoDB).
  • Write to output collections (whenever the output is to MongoDB).

The first two items above require special privilege on the "admin" database. The rest can have permissions defined only on the specific databases where they apply. In other words, you can choose to split up these privileges among two users: one in the admin database, and another in the input/output database. Please consult the MongoDB Manual for details on what roles to grant users to fit your needs.

Basic Configuration

Credentials are passed to the Hadoop connector through MongoDB connection strings. If the only privileges your job requires are input/output from/to a collection, then it's sufficient to define a user on the input/output databases and supply the credentials in the string passed to the mongo.input.uri option. This extremely basic configuration only works if you're reading the collection as a single split (and therefore don't need to call splitVector or read from config.chunks).

In cases where you need extra privileges (such as when calling splitVector, for example), the Hadoop connector needs to be able to authenticate against the "admin" database. Pass these credentials into the mongo.auth.uri option. Note that because the connection string is a URI, you can pass in a different host from mongo.input.uri here. This is handy when the admin database is located on a different host in your MongoDB cluster. N.B. When writing output back to MongoDB, the Hadoop connector will use the credentials in mongo.auth.uri over mongo.input.uri to authenticate before writing.

Examples

Coming soon.

Clone this wiki locally