confusing /etc/hadoop being recreated post orchestration #26

hortovanyi · 2015-07-16T07:53:59Z

Am trying to install Apache Hadoop 2.7.1.

Not sure why I've got a conf-2.2.0 & conf.dist directory under /etc/hadoop with .conf linked to /etc/alternatives/hadoop-conf-link??

nick@fig:/etc/hadoop$ ls -lat
total 28
drwxr-xr-x 147 root root 12288 Jul 16 16:59 ..
drwxr-xr-x   2 root root  4096 Jul 16 15:59 conf-2.7.1
drwxr-xr-x   2 root root  4096 Jul 16 13:46 conf-2.2.0
drwxr-xr-x   5 root root  4096 Jul 16 13:46 .
lrwxrwxrwx   1 root root    34 Jul 16 13:05 conf -> /etc/alternatives/hadoop-conf-link
drwxr-xr-x   2 root root  4096 Jun 29 16:15 conf.dist

have tried removing the offending directories and linking conf -> ./conf-2.7.1. However after a reboot it returns to the above configuration without a salt run.

The text was updated successfully, but these errors were encountered:

sroegner · 2015-07-21T10:07:25Z

Can you share your pillar settings please?

hortovanyi · 2015-07-22T00:08:33Z

This is what I'm using presently

java_home:
  /usr/lib/jvm/java-7-openjdk-amd64

hadoop:
  version: apache-2.7.1 # ['apache-1.2.1', 'apache-2.2.0', 'hdp-1.3.0', 'hdp-2.2.0', 'cdh-4.5.0', 'cdh-4.5.0-mr1']
  versions:
    apache-2.7.1:
      version: 2.7.1
      version_name: hadoop-2.7.1
      source_url: http://apache.mirror.digitalpacific.com.au/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
      major_version: '2'
  targeting_method: grain # [compound, glob] also supported
  users:
    hadoop: 6000
    hdfs: 6001
    mapred: 6002
    yarn: 6003
  config:
    directory: /etc/hadoop/conf
    core-site:
      io.native.lib.available:
        value: true
      io.file.buffer.size:
        value: 65536
      fs.trash.interval:
        value: 60

hdfs:
  namenode_target: "roles:hadoop_master" # Specify compound matching string to match all your namenodes
  datanode_target: "roles:hadoop_slave" # Specify compound matching string to match all your datanodes e.g. if you were to use pillar I@datanode:true
  config:
    namenode_port: 8020
    namenode_http_port: 50070
    secondarynamenode_http_port: 50090
    # the number of hdfs replicas is normally auto-configured for you in hdfs.settings
    # according to the number of available datanodes
    # replication: 1
    hdfs-site:
      dfs.permission:
        value: false
      dfs.durable.sync:
        value: true
      dfs.datanode.synconclose:
        value: true

mapred:
  jobtracker_target: "roles:hadoop_master"
  tasktracker_target: "roles:hadoop_slave"
  config:
    jobtracker_port: 9001
    jobtracker_http_port: 50030
    jobhistory_port: 10020
    jobhistory_webapp_port: 19888
    history_dir: /mr-history
    mapred-site:
      mapred.map.memory.mb:
        value: 1536
      mapred.map.java.opts:
        value: -Xmx1024M
      mapred.reduce.memory.mb:
        value: 3072
      mapred.reduce.java.opts:
        value: -Xmx1024m
      mapred.task.io.sort.mb:
        value: 512
      mapred.task.io.sort.factor:
        value: 100
      mapred.reduce.shuffle.parallelcopies:
        value: 50

# you only have to configure the capacity-scheduler section to the extent you need it - if omitted the 
# resulting file (HADOOP_CONF/capacity-scheduler.xml) will just remain empty (no defaults)
yarn:
  resourcemanager_target: "roles:hadoop_master"
  nodemanager_target: "roles:hadoop_slave"
  config:
    yarn-site:
      yarn.nodemanager.aux-services:
        value: mapreduce_shuffle
      yarn.nodemanager.aux-services.mapreduce.shuffle.class:
        value: org.apache.hadoop.mapred.ShuffleHandler

sroegner · 2015-07-23T15:15:37Z

@hortovanyi Thanks for your response - I just tried and was able to reproduce this on an older version of the hadoop-formula i still had on disk, but everything seems fine on the latest. Can you verify that you are on the latest formula code?

hortovanyi · 2015-07-29T23:51:41Z

yes was the latest git clone when I ran it. I did run it a number of times as I was getting used to the formula.

sroegner · 2015-10-27T18:40:58Z

Sorry I had to drop this for so long - here is another attempt on an explanation:

/etc/hadoop/conf is where hadoop looks - we use the alternatives system to switch this around if necessary (not saying this is a good idea, just the way it is right now)
/etc/hadoop/conf.dist is a copy of the conf directory as it is inside the tarball, not terribly useful
any conf-{version} directories would at some point have been the product of a salt run with {version} coming out of your hadoop.version pillar - the alternatives hadoop-conf-link will only ever point to the one resulting from the latest call

hth

sroegner added the bug label Jul 23, 2015

sroegner self-assigned this Jul 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confusing /etc/hadoop being recreated post orchestration #26

confusing /etc/hadoop being recreated post orchestration #26

hortovanyi commented Jul 16, 2015

sroegner commented Jul 21, 2015

hortovanyi commented Jul 22, 2015

sroegner commented Jul 23, 2015

hortovanyi commented Jul 29, 2015

sroegner commented Oct 27, 2015

confusing /etc/hadoop being recreated post orchestration #26

confusing /etc/hadoop being recreated post orchestration #26

Comments

hortovanyi commented Jul 16, 2015

sroegner commented Jul 21, 2015

hortovanyi commented Jul 22, 2015

sroegner commented Jul 23, 2015

hortovanyi commented Jul 29, 2015

sroegner commented Oct 27, 2015