This project uses stock market data with the help of Kafka to simulate the real time data.
- AWS account
- Python
- Libraries needed - confluent_kafka, boto3 and pandas,
- SSH into EC2 instance.
- Install kafka in the EC2 instance.
- Change directory to your kafka folder in EC2 and update the "config/server.properties" file for ADVERTISED_LISTENERS to public ip of the EC2 instance using the command "sudo nano config/server.properties"
- Start zoo keeper using command "bin/zookeeper-server-start.sh config/zookeeper.properties"
- Open another SSH session to EC2 and change your directory to kafka folder
- Use command " export KAFKA_HEAP_OPTS = '-Xmx256M -Xms128M' " to increase kafka server memory
- Start Kafka Server using "bin/kafka-server-start.sh config/server.properties"
- Open another SSH session to EC2 and change your directory to kafka folder.
Use command "bin/kafka-topics.sh --create --topic {your topic name} --bootstrap-server {EC2 public IP}:9092 --replication-factor 1 --partitions 1" - Make you kafka_consumer.py executable using "chmod u+x" command. Run your producer script then run your consumer script
- Create AWS Glue crawler to create a table from your S3 location.
- Use Amazon Athena to query the data.