From 777e00aa8fe93d086434cf6345b310f286bdc991 Mon Sep 17 00:00:00 2001 From: Ken Sipe Date: Mon, 16 Nov 2015 10:06:02 -0600 Subject: [PATCH 1/2] adding details on how to config the HDFS FW --- README.md | 5 +---- config.md | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 4 deletions(-) create mode 100644 config.md diff --git a/README.md b/README.md index 9ea4eae2..53799559 100644 --- a/README.md +++ b/README.md @@ -28,11 +28,8 @@ Installing HDFS-Mesos on your Cluster 3. Optional: Customize any additional configurations that weren't updated at compile time in `hdfs-mesos-*/etc/hadoop/*-site.xml` Note that if you update hdfs-site.xml, it will be used by the scheduler and bundled with the executors. However, core-site.xml and mesos-site.xml will be used by the scheduler only. 4. Check that `hostname` on that node resolves to a non-localhost IP; update /etc/hosts if necessary. -### If you have Hadoop pre-installed in your cluster -If you have Hadoop installed across your cluster, you don't need the Mesos scheduler application to distribute the binaries. You can set the `mesos.hdfs.native-hadoop-binaries` configuration parameter in `mesos-site.xml` if don't want the binaries distributed. -### Mesos-DNS custom configuration -You can see the example configuration in the `example-conf/dcos` directory. Since Mesos-DNS provides native bindings for master detection, we can simply use those names in our mesos and hdfs configurations. The example configuration assumes your Mesos masters and your zookeeper nodes are colocated. If they aren't you'll need to specify your zookeeper nodes separately. Also, note that you are using the example in `example-conf/dcos`, the `mesos.hdfs.native-hadoop-binaries` property needs to be set to `false` if your HDFS binaries are not predistributed. +**NOTE:** Read [Configurations](config.md) for details on how to configure and custom HDFS. Starting HDFS-Mesos -------------------------- diff --git a/config.md b/config.md new file mode 100644 index 00000000..1da5a4fc --- /dev/null +++ b/config.md @@ -0,0 +1,57 @@ +## Configuration of HDFS framework and HDFS + +The configuration of HDFS and this framework are managed via: + +* hdfs-site.xml +* mesos-site.xml +* system env vars and proeprties + +The hdfs-site.xml file is used to configure the hdfs cluster. The values must match the configuration fo the scheduler. For this +reason the hdfs-site.xml is generally "fetched" or refreshed from the scheduler when a node is started. The normal configuration of +the hdfs-site.xml has variables which are replaced by the scheduler when the xml file is fetched by the node. An example of these +variables is `${frameworkName}`. The scheduler code that does the variable replacement is handled by ConfigServer.java. An +example of this variable replacement is `model.put("frameworkName", hdfsFrameworkConfig.getFrameworkName());` + +For environments which are "provisioned" with hdfs and managed by hdfs-mesos it is expected that the values of this xml file are +established for the deployment. Environments which are designated as `mesos.hdfs.native-hadoop-binaries` == true in the `mesos-site.xml`, +there is no refresh of the `hdfs-site.xml` file. + +The mesos-site.xml file is used to configure the hdfs-mesos framework. We are working to deprecated this file. This general establishes +values for the scheduler and in many cases these are passed to the executors. Although the configuration of the scheduler can be handled +via XML configuration, we encourage the use of system environment variables for this purpose. + +## Configuration Options + +* mesos.hdfs.framework.name - Used to define the framework name. This allows for 1) multi-deployments of hdfs and 2) has an impact on the dns name of the service. The default is "hdfs". +* mesos.hdfs.user - Used to define the user to use for the scheduler and executor processes. The default is root. +* mesos.hdfs.role - Used to determine the mesos role this framework will use. The default is "*". +* mesos.hdfs.mesosdns - true if mesos-dns is used. The default is false. +* mesos.hdfs.mesosdns.domain - When using mesos-dns, this value is the suffix used by mesos-dns. The default is "mesos". +* mesos.native.library - The location of libmesos library. The default is "/usr/local/lib/libmesos.so" +* mesos.hdfs.journalnode.count - The number of journal nodes the scheduler will maintain. The default is 3. +* mesos.hdfs.data.dir - The location to store data on the slaves. The default is "/var/lib/hdfs/data". +* mesos.hdfs.domain.socket.dir - The location used for a local socket used by the data nodes. The default is "/var/run/hadoop-hdfs". +* mesos.hdfs.backup.dir - The location to replicated data to as a backup. The default is blank. +* mesos.hdfs.native-hadoop-binaries - This is true if hdfs is pre-installed on the slaves. This will result in no distribution of binaries to the slaves. It will also mean that no xml configure refresh will be provided to the slaves. The default is false. +* mesos.hdfs.framework.mnt.path - If native-hadoop-binaries == false, this is the location a symlink will be provided to execute hdfs commands on the slave. The default is "/opt/mesosphere" +* mesos.hdfs.state.zk - The zookeeper that the scheduler will use to store state. The default is "localhost:2181" +* mesos.master.uri - The zookeeper or mesos-master url that will be used to discover the mesos-master for scheduler registration. The default is "localhost:2181" +* mesos.hdfs.zkfc.ha.zookeeper.quorum - The zookeeper that HDFS (not the framework) will use for HA mode. The default is "localhost:2181" + +There are additional configurations for executor jvm and resource management of the nodes. + +## System Environment Variables + +All of configuration flags previous defined can be override with system environment variables. The format to use to over a variable is to +upper case the string and replace dots (".") with underscores ("_"). so to override the `mesos.hdfs.framework.name`, the value is `MESOS_HDFS_FRAMEWORK_NAME=unicorn". +To use this value, export the value, then start the scheduler. If a value is overridden by the system environment variable it will be propagated to +the executors. + +## Custom Configurations + +### Mesos-DNS custom configuration +You can see the example configuration in the `example-conf/dcos` directory. Since Mesos-DNS provides native bindings for master detection, we can simply use those names in our mesos and hdfs configurations. The example configuration assumes your Mesos masters and your zookeeper nodes are colocated. If they aren't you'll need to specify your zookeeper nodes separately. Also, note that you are using the example in `example-conf/dcos`, the `mesos.hdfs.native-hadoop-binaries` property needs to be set to `false` if your HDFS binaries are not predistributed. + +### If you have Hadoop pre-installed in your cluster +If you have Hadoop installed across your cluster, you don't need the Mesos scheduler application to distribute the binaries. You can set the `mesos.hdfs.native-hadoop-binaries` configuration parameter in `mesos-site.xml` if don't want the binaries distributed. + From d079dd9ed42bd4fad8d1e27182e885930a64bb81 Mon Sep 17 00:00:00 2001 From: Ken Sipe Date: Mon, 16 Nov 2015 16:53:21 -0600 Subject: [PATCH 2/2] fixed copy based on feedback --- config.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/config.md b/config.md index 1da5a4fc..6b0b7263 100644 --- a/config.md +++ b/config.md @@ -4,17 +4,19 @@ The configuration of HDFS and this framework are managed via: * hdfs-site.xml * mesos-site.xml -* system env vars and proeprties +* system env vars and properties -The hdfs-site.xml file is used to configure the hdfs cluster. The values must match the configuration fo the scheduler. For this +The hdfs-site.xml file is used to configure the hdfs cluster. The values must match the configuration of the scheduler. For this reason the hdfs-site.xml is generally "fetched" or refreshed from the scheduler when a node is started. The normal configuration of the hdfs-site.xml has variables which are replaced by the scheduler when the xml file is fetched by the node. An example of these variables is `${frameworkName}`. The scheduler code that does the variable replacement is handled by ConfigServer.java. An example of this variable replacement is `model.put("frameworkName", hdfsFrameworkConfig.getFrameworkName());` -For environments which are "provisioned" with hdfs and managed by hdfs-mesos it is expected that the values of this xml file are -established for the deployment. Environments which are designated as `mesos.hdfs.native-hadoop-binaries` == true in the `mesos-site.xml`, -there is no refresh of the `hdfs-site.xml` file. +It is possible to have the HDFS-mesos framework manage hdfs node instances on slaves that are previously provisioned with hdfs. Under scenario +there is no way to update the `hdfs-site.xml` file. This is indicated by setting the property `mesos.hdfs.native-hadoop-binaries` == true in the `mesos-site.xml` file. +This indicates that binaries exist on the nodes. Because the values in the `hdfs-site.xml` are not controlled by the HDFS-Mesos framework, it +is important to make sure that all the xml files are consistent and the framework is started with property values which are consistent with the +preexisting cluster. The mesos-site.xml file is used to configure the hdfs-mesos framework. We are working to deprecated this file. This general establishes values for the scheduler and in many cases these are passed to the executors. Although the configuration of the scheduler can be handled @@ -42,16 +44,16 @@ There are additional configurations for executor jvm and resource management of ## System Environment Variables -All of configuration flags previous defined can be override with system environment variables. The format to use to over a variable is to -upper case the string and replace dots (".") with underscores ("_"). so to override the `mesos.hdfs.framework.name`, the value is `MESOS_HDFS_FRAMEWORK_NAME=unicorn". +All of the configuration flags previously defined can be overriden with system environment variables. The format to use to override a variable is to +upper case the string and replace dots (".") with underscores ("_"). For example, to override the `mesos.hdfs.framework.name`, the value is `MESOS_HDFS_FRAMEWORK_NAME=unicorn". To use this value, export the value, then start the scheduler. If a value is overridden by the system environment variable it will be propagated to the executors. ## Custom Configurations ### Mesos-DNS custom configuration -You can see the example configuration in the `example-conf/dcos` directory. Since Mesos-DNS provides native bindings for master detection, we can simply use those names in our mesos and hdfs configurations. The example configuration assumes your Mesos masters and your zookeeper nodes are colocated. If they aren't you'll need to specify your zookeeper nodes separately. Also, note that you are using the example in `example-conf/dcos`, the `mesos.hdfs.native-hadoop-binaries` property needs to be set to `false` if your HDFS binaries are not predistributed. +You can see an example configuration in the `example-conf/dcos` directory. Since Mesos-DNS provides native bindings for master detection, we can simply use those names in our mesos and hdfs configurations. The example configuration assumes your Mesos masters and your zookeeper nodes are colocated. If they aren't you'll need to specify your zookeeper nodes separately. Also, note that if you are using the example in `example-conf/dcos`, the `mesos.hdfs.native-hadoop-binaries` property needs to be set to `false` if your HDFS binaries are not predistributed. ### If you have Hadoop pre-installed in your cluster -If you have Hadoop installed across your cluster, you don't need the Mesos scheduler application to distribute the binaries. You can set the `mesos.hdfs.native-hadoop-binaries` configuration parameter in `mesos-site.xml` if don't want the binaries distributed. +If you have Hadoop installed across your cluster, you don't need the Mesos scheduler application to distribute the binaries. You can set the `mesos.hdfs.native-hadoop-binaries` configuration parameter in `mesos-site.xml` if you don't want the binaries distributed.