Skip to content

Hadoop 2 Specific Inputs & Actions

impetus-opensource edited this page Jan 7, 2015 · 4 revisions

Use vendor, version and source input to configure the vendor, version and bundle source for the Hadoop 2 version that needs to be deployed.

Note: In many of the path related configuration parameters, the default value that appears contains “$user”. It should be noted that, it automatically gets updated to “user name” based on the value provided in user name input of node authentication.

Other than this, there are few more Hadoop 2 technology specific inputs as follows.

![] (https://raw.githubusercontent.com/wiki/impetus-opensource/ankush/images/hadoop/Image3.png) Figure 3: Configuring Hadoop 2 Cluster

Note: The default value is primarily provided either on the basis of what is suggested by the technology-specific component or is suggested by Ankush product.

  1. S3 Support – Access Key: ID of the account to access S3.
  2. S3 Support – Secret Key: Authentication key of the account to access S3.
  3. S3n Support – Access Key: ID of the account to access S3n.
  4. S3n Support – Secret Key: Authentication key of the account to access S3n.

The various configurable parameters as per the user’s deployment environment configuration are as follows:

"NameNode Path" with default path/value as /home//hes/hadoopDirs/name "DataNode Path" with default path/value as /home//hes/hadoopDirs/data "Mapred Temp Path" with default path/value as /home//hes/hadoopDirs/mrtmp "Hadoop Temp Path" with default path/value as /home//hes/hadoopDirs/hadooptmp "Web Application Proxy" with default path/value as Enabled "Web App Proxy Node" with default path/value as List of retrieved nodes "Resource Manager Node" with default path/value as List of retrieved nodes "Job History Server" with default path/value as Disabled, List of retrieved nodes "High Availability" with default path/value as Enabled "Nameservice ID" with default path/value as Value of Cluster Name "StandBy NameNode" with default path/value as List of retrieved nodes "NameNode ID 1" with default path/value as nn1 "NameNode ID 2" with default path/value as nn2 "Journal Nodes" with default path/value as List of retrieved nodes "Journal Nodes Dir" with default path/value as /home//hes/hadoopDirs/jndata "Automatic Failover" with default path/value as Enabled

High Availability Configuration: Hadoop 2 can be deployed with high availability support by enabling “High Availability” option in Hadoop configuration. By default, it is enabled. When it is enabled, secondary Namenode column/input is disabled on node list. To deploy Hadoop 2 cluster with SecondaryNameNode, user needs to disable high availability.

Zookeeper component needs to be selected for high availability configuration.

Currently during Hadoop 2 cluster setup apart from Hadoop user also has provision to setup nine more ecosystem components along with Hadoop. Those components are:

  1. Flume
  2. Hbase
  3. Hive
  4. Mahout
  5. Oozie
  6. Pig
  7. Solr
  8. Sqoop
  9. Zookeeper

![] (https://raw.githubusercontent.com/wiki/impetus-opensource/ankush/images/hadoop/Image4.png) Figure 4: Hadoop 2 Ecosystem Components

Out of all these Hadoop 2 components Hbase, Hive & Zookeeper take considerably more inputs than other components. The inputs are mainly related with Nodes & advanced configuration settings. By clicking on “>” against each selected component user can further configure it.

HBase configuration details are as follows:

![] (https://raw.githubusercontent.com/wiki/impetus-opensource/ankush/images/hadoop/Image5.png) Figure 5: HBase Configuration

The various configurable parameters on the Hadoop 2’s “Hbase” page as per the user’s deployment environment configuration are as follows:

"Region Servers" with default path/value as List of retrieved nodes "File Size" with default path/value as 10737418240 (bytes) "Compaction Threshold" with default path/value as 3 "Cache Size" with default path/value as 0.25 (%) "Caching" with default path/value as 1 "Timeout" with default path/value as 180000 (milliseconds) "Multiplier" with default path/value as 2 "Major Compaction" with default path/value as 86400000 (milliseconds) "Max Size" with default path/value as 10485760 (bytes) "Flush Size" with default path/value as 134217728 (bytes) "Handler Count" with default path/value as 10

Hive configuration details are as follows:

![] (https://raw.githubusercontent.com/wiki/impetus-opensource/ankush/images/hadoop/Image6.png) Figure 6: Hadoop 2: Hive Configuration

The various configurable parameters on the Hadoop 2’s “Hive” page as per the user’s deployment environment configuration are as follows:

"Hive Server" with default path/value as List of retrieved nodes "Connection Driver Name" with default path/value as org.apache.derby.jdbc.EmbeddedDriver "Connection URL" with default path/value as jdbc:derby:;databaseName=metastore_db;create=true "Connection User Name" with default path/value as APP "Connection Password" with default path/value as Mine

Zookeeper configuration details are as follows:

![] (https://raw.githubusercontent.com/wiki/impetus-opensource/ankush/images/hadoop/Image7.png) Figure 7: Hadoop 2: Hive - Configurable Parameters

  1. Zookeeper Nodes: Nodes for Zookeeper.

The various configurable parameters on the Hadoop 2’s “Zookeeper” page as per the user’s deployment environment configuration are as follows:

"Tick Time" with default path/value as 2000 (milliseconds) "Client Port" with default path/value as 2182 "Data Dir" with default path/value as /home//hes/zookeeper/zk_data_dir/ "Sync Limit" with default path/value as 2 (milliseconds) "Init Limit" with default path/value as 5 (milliseconds)

Configuring Roles: From the retrieved node list, configure the nodes that needs to be used as NameNode, SecondaryNameNode and DataNodes. Selecting SecondaryNameNode is optional.

Note:

  1. For each ecosystem component that needs to be installed configure its vendor, version, bundle path and installation path if any custom changes are required. Otherwise, the mentioned default values and paths will be used.
  2. Flume, Mahout, Oozie, Pig, Solr and Sqoop (if selected) are deployed only on that node which is configured as NameNode.
  3. HBase master is always deployed on node configured as NameNode

It should be noted that the user needs to click the “Deploy” button as shown in the right topmost corner of the screen to deploy the cluster in the environment.

Clone this wiki locally