title | author |
---|---|
docker - Big data tools project |
Abdellatif Ahammad |
:flag-ma: the main principle of this repo is to have the complete tools of big data run on the way and ready to use.
🎈 there are two ways there so you can choose anyone you want .
this offers a Hadoop cluster from 2014, with Cloudera express manager (free version), that gives a nice experience to launch and stop any tool from its UI.
Name | Port |
---|---|
HDFS | 8002 |
Cloudera Manager | 7180 |
Hue | 8888 |
Hive | 10002 |
Oozie | 11000 |
Zookeeper | 2181 |
Solr | 8983 |
Sqoop Metastore | 16000 |
Impala | - |
Spark Master | 7077 |
use only the hadoop cluster without the cloudera manager :first_place_medal:
git clone https://github.com/abdellatifAhammad/Dockerized-bigdata-tools
cd docker-bigdata/cloudera_express
sudo docker-compose up
use the cluster with the cloudera manager ⛳ inside the cloudera_express folder use this
sudo docker attach abdo_cdh
/home/cloudera/cloudera-manager --express
Hue Hive inside Hue (hive editor) Impala cloudera_manager Interface you have to launch the service that you want manually from this dashboard HBase
this one is a Hadoop cluster that contains multiple data nodes (just 3 in this docker-compose file you can make more of them)
git clone https://github.com/abdellatifAhammad/Dockerized-bigdata-tools
cd docker-bigdata/big_data
sudo dcoker-compose up -d
then you can check all these nice tools (check ports from the docker-compose file)
- Hive
- hue
- zookeper
- mysql
- kafka
- hbase
- mongo
- sqoop metabase
- streamsets
- storm