Skip to content

HadoopInTheCloud is a portable automated tool used to deploy, configure and control Hadoop multi-node clusters in cloud environments.

License

Notifications You must be signed in to change notification settings

LeHackerman/HadoopInTheCloud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop In The Cloud

Hadoop In The Cloud is a deployment, configuration and control automation tool used to simplify running a Hadoop multi-node cluster in a cloud environment. All he provisionning, key generation, installation and configuration needed to run a Hadoop cluster is covered by this tool. All in all, it allows you to use Hadoop in a multi-node fashion starting from scratch. For now, it only supports Microsoft Azure.

Installation

No installation is required, this tool is portable. Although some dependencies are installed locally in the tool's folder during the first run.
So, just clone it.

git clone https://github.com/LeHackerman/HadoopInTheCloud.git

Requirements

This tool requries:

  • Python3
  • Python3-venv
  • jq
    Install them using your system's package manager.

Usage

The tool SHOULD be ran from its parent directory. Don't call it from other locations it as it depends on some relative paths. (This is temporary as I wrote the script in a hurry. A future update will make the tool more conform to the UNIX paradigm)

./hitc.sh

Roadmap

  • Write a playbook that provisions the necessary infrastructure.
  • Write a playbook that installs and configures Hadoop in a supplied resource group.
  • Write a dynamic inventory file to allow a certain modularity and decoupling between the playbooks and the infrastructure.
  • Write a shell script that automates the login, dependency installation and orchestration of playbooks.
  • Configure traps for a cleaner and more reliable execution.
  • Write playbooks to automate: The launch, stop and deletion of the machines / resource group.
  • Add argument passing capabilties to the script.
  • Decouple script's dependence to its parent directory and align with the UNIX paradigm.

Contributing

Just make a pull request or hmu or smthg ¯_(ツ)_/¯ .

Nota Bene

  • This is an alpha release. So, the tool is pretty sensitive to environmental misconfiguration. If it fails in any way, just execute rm -rf ./.ansible ./.az ./.ssh and run it again.
  • DO NOT USE THIS IN PRODUCTION. The tool was made with academic usage in mind, so, some security risks were tolerated. But, those can't be tolerated in a professional environment.
  • Some values are parametrized and could be modified accordingly such as the Azure subscription to use, the number of nodes, the resource group name, etc. But, these are hardcoded in the shell script. So, if you wat to change them, either change the hardcoded values or tell me and I'll add the option of passing them as arguments.

License

GPLv3

About

HadoopInTheCloud is a portable automated tool used to deploy, configure and control Hadoop multi-node clusters in cloud environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages