Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusing /etc/hadoop being recreated post orchestration #26

Open
hortovanyi opened this issue Jul 16, 2015 · 5 comments
Open

confusing /etc/hadoop being recreated post orchestration #26

hortovanyi opened this issue Jul 16, 2015 · 5 comments
Assignees
Labels

Comments

@hortovanyi
Copy link

Am trying to install Apache Hadoop 2.7.1.

Not sure why I've got a conf-2.2.0 & conf.dist directory under /etc/hadoop with .conf linked to /etc/alternatives/hadoop-conf-link??

nick@fig:/etc/hadoop$ ls -lat
total 28
drwxr-xr-x 147 root root 12288 Jul 16 16:59 ..
drwxr-xr-x   2 root root  4096 Jul 16 15:59 conf-2.7.1
drwxr-xr-x   2 root root  4096 Jul 16 13:46 conf-2.2.0
drwxr-xr-x   5 root root  4096 Jul 16 13:46 .
lrwxrwxrwx   1 root root    34 Jul 16 13:05 conf -> /etc/alternatives/hadoop-conf-link
drwxr-xr-x   2 root root  4096 Jun 29 16:15 conf.dist

have tried removing the offending directories and linking conf -> ./conf-2.7.1. However after a reboot it returns to the above configuration without a salt run.

@sroegner
Copy link
Member

Can you share your pillar settings please?

@hortovanyi
Copy link
Author

This is what I'm using presently

java_home:
  /usr/lib/jvm/java-7-openjdk-amd64

hadoop:
  version: apache-2.7.1 # ['apache-1.2.1', 'apache-2.2.0', 'hdp-1.3.0', 'hdp-2.2.0', 'cdh-4.5.0', 'cdh-4.5.0-mr1']
  versions:
    apache-2.7.1:
      version: 2.7.1
      version_name: hadoop-2.7.1
      source_url: http://apache.mirror.digitalpacific.com.au/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
      major_version: '2'
  targeting_method: grain # [compound, glob] also supported
  users:
    hadoop: 6000
    hdfs: 6001
    mapred: 6002
    yarn: 6003
  config:
    directory: /etc/hadoop/conf
    core-site:
      io.native.lib.available:
        value: true
      io.file.buffer.size:
        value: 65536
      fs.trash.interval:
        value: 60

hdfs:
  namenode_target: "roles:hadoop_master" # Specify compound matching string to match all your namenodes
  datanode_target: "roles:hadoop_slave" # Specify compound matching string to match all your datanodes e.g. if you were to use pillar I@datanode:true
  config:
    namenode_port: 8020
    namenode_http_port: 50070
    secondarynamenode_http_port: 50090
    # the number of hdfs replicas is normally auto-configured for you in hdfs.settings
    # according to the number of available datanodes
    # replication: 1
    hdfs-site:
      dfs.permission:
        value: false
      dfs.durable.sync:
        value: true
      dfs.datanode.synconclose:
        value: true

mapred:
  jobtracker_target: "roles:hadoop_master"
  tasktracker_target: "roles:hadoop_slave"
  config:
    jobtracker_port: 9001
    jobtracker_http_port: 50030
    jobhistory_port: 10020
    jobhistory_webapp_port: 19888
    history_dir: /mr-history
    mapred-site:
      mapred.map.memory.mb:
        value: 1536
      mapred.map.java.opts:
        value: -Xmx1024M
      mapred.reduce.memory.mb:
        value: 3072
      mapred.reduce.java.opts:
        value: -Xmx1024m
      mapred.task.io.sort.mb:
        value: 512
      mapred.task.io.sort.factor:
        value: 100
      mapred.reduce.shuffle.parallelcopies:
        value: 50

# you only have to configure the capacity-scheduler section to the extent you need it - if omitted the 
# resulting file (HADOOP_CONF/capacity-scheduler.xml) will just remain empty (no defaults)
yarn:
  resourcemanager_target: "roles:hadoop_master"
  nodemanager_target: "roles:hadoop_slave"
  config:
    yarn-site:
      yarn.nodemanager.aux-services:
        value: mapreduce_shuffle
      yarn.nodemanager.aux-services.mapreduce.shuffle.class:
        value: org.apache.hadoop.mapred.ShuffleHandler

@sroegner sroegner added the bug label Jul 23, 2015
@sroegner sroegner self-assigned this Jul 23, 2015
@sroegner
Copy link
Member

@hortovanyi Thanks for your response - I just tried and was able to reproduce this on an older version of the hadoop-formula i still had on disk, but everything seems fine on the latest. Can you verify that you are on the latest formula code?

@hortovanyi
Copy link
Author

yes was the latest git clone when I ran it. I did run it a number of times as I was getting used to the formula.

@sroegner
Copy link
Member

Sorry I had to drop this for so long - here is another attempt on an explanation:

  • /etc/hadoop/conf is where hadoop looks - we use the alternatives system to switch this around if necessary (not saying this is a good idea, just the way it is right now)
  • /etc/hadoop/conf.dist is a copy of the conf directory as it is inside the tarball, not terribly useful
  • any conf-{version} directories would at some point have been the product of a salt run with {version} coming out of your hadoop.version pillar - the alternatives hadoop-conf-link will only ever point to the one resulting from the latest call

hth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants