Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes bug in consumer VM startup script #213

Merged
merged 3 commits into from
Oct 31, 2023

Conversation

hernandezc1
Copy link
Collaborator

@hernandezc1 hernandezc1 commented Oct 23, 2023

Related to Issue #212.

For the ELAsTiCC2 Challenge, our consumer VM will be listening to multiple KAFKA topics simultaneously. When defining multiple topics, the variable KAFKA_TOPIC is initialized as a comma-separated list of topics: KAFKA_TOPIC="topic1,topic2,etc". These topics appear in the file: list.topics in the following format.

topic1
topic2
etc

However, this creates a bug when the following code block from vm_startup.sh is executed.

if grep -Fq "${KAFKA_TOPIC}" "${fout_topics}"
    then
        alerts_flowing="true"  # start consuming
    else
        sleep 60s  # sleep 1 min, then try again
fi

This is because the grep -Fq "${KAFKA_TOPIC}" "${fout_topics}" command looks for the exact string, topic1,topic2,etc, in list.topics.

This PR updates the requirements needed to start the Kafka -> Pub/Sub connector for our consumer VM. In order to accommodate for multiple topics, we now parse through the string containing the comma-separated Kafka topics and append each topic name to an array called KAFKA_TOPIC_ARRAY. Using a for loop, we search to see whether the elements of KAFKA_TOPIC_ARRAY are present in list.topics. If a topic is present, then the value true is appended to separate array: kafka_topic_present.

If all of the topics are present in list.topics, then KAFKA_TOPIC_ARRAY and kafka_topic_present will have the same length. This requirement must now be satisfied in order to start the Kafka -> Pub/Sub connector.

The changes to the startup script have been successfully tested.

@hernandezc1 hernandezc1 self-assigned this Oct 23, 2023
@hernandezc1 hernandezc1 added Bug Something isn't working ELAsTiCC Tracks development of our ELAsTiCC broker Pipeline: Conductor Components that orchestrate the live pipeline labels Oct 23, 2023
@hernandezc1 hernandezc1 requested a review from troyraen October 23, 2023 20:52
Copy link
Collaborator

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes requested below, mostly related to code organization and comments.

One overall question: Does this still work if KAFKA_TOPIC is a single topic? We don't want to lose the ability to do that.

BTW, I didn't actually test this and I don't plan to. If I haven't said it explicitly before, I am no longer testing all of your PRs, especially the ones that affect code that only you are using right now. I'm reviewing it as best I can from reading it. It's up to you to try to think through all the possibilities when testing make sure your stuff actually works.

@@ -87,17 +87,28 @@ do
# which kills the while loop. no working suggestions found.
# passing the error with `|| true`


IFS=', ' read -r -a KAFKA_TOPIC_ARRAY <<< "$KAFKA_TOPIC"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in the section at the top of the file that defines topic-related variables. Actually there's currently two sections, starting on lines 49 and 52, I would combine them into one and add this there.

Add a comment describing what this line does. Shell is not as intuitive to read as python, so more comments are helpful.

@@ -87,17 +87,28 @@ do
# which kills the while loop. no working suggestions found.
# passing the error with `|| true`


IFS=', ' read -r -a KAFKA_TOPIC_ARRAY <<< "$KAFKA_TOPIC"
KAFKA_TOPIC_PRESENT=()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important that this variable be defined right before the next for loop starts. Make this more clear to future readers by moving this down between the current lines 94 and 95 so that it's actually part of the same block of code.

Additionally, using all caps to name a variable is typically only done for environment variables and module level variables. This is neither, so use snake case.


IFS=', ' read -r -a KAFKA_TOPIC_ARRAY <<< "$KAFKA_TOPIC"
KAFKA_TOPIC_PRESENT=()

# check if our topic is in the list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment to better describe what will happen in this code block. Include info about different cases, e.g., multiple topics but one goes live before the other. You may want to point at issue #212 where the ongoing discussion will occur and options & unknowns can be described in more detail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a more descriptive comment, please let me know what you think. I felt it might be more appropriate to keep the discussion about the different cases in Issue #212, but if you disagree, let me know and I will update the comment again

Comment on lines 103 to 104
if [ ${#KAFKA_TOPIC_PRESENT[@]} -eq ${#KAFKA_TOPIC_ARRAY[@]} ]; then
alerts_flowing="true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to the first line explaining what it does. I imagine it's something like, "check whether the two arrays are the same length".

If I understand correctly, this will cause the Connector to start up only if all topics are present. If so:

  1. This should be explicitly stated in the comment above the code block (line 94).

  2. In Defining multiple KAFKA topics that our consumer VM will listen to #212 you said:

    Even if one of the (multiple) topics is not live, the consumer will ingest alerts from the topics that are live.

    I'm guessing that what you actually tested was the behavior of the Kafka -> Pub/Sub Connector (launched by the last command in the script /bin/connect-standalone ...). I recommend updating the languaging in the issue to reflect that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that what you actually tested was the behavior of the Kafka -> Pub/Sub Connector (launched by the last command in the script /bin/connect-standalone ...). I recommend updating the languaging in the issue to reflect that.

Yes, this is correct. I will update the main comment shortly

Comment on lines 95 to 106
sleep 60s # sleep 1 min, then try again
sleep 60s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete the comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, this shouldn't have been deleted

Comment on lines -103 to +114
&>> "${fout_run}"
&>> "${fout_run}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a real change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that this has been happening more frequently in a few different PRs. I'll make sure to review my changes before committing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions on how to fix this?

@hernandezc1
Copy link
Collaborator Author

One overall question: Does this still work if KAFKA_TOPIC is a single topic? We don't want to lose the ability to do that.

Yes, it still works.

BTW, I didn't actually test this and I don't plan to. If I haven't said it explicitly before, I am no longer testing all of your PRs, especially the ones that affect code that only you are using right now. I'm reviewing it as best I can from reading it. It's up to you to try to think through all the possibilities when testing make sure your stuff actually works.

I'll request your review again momentarily. I have been testing all of my changes thoroughly. I plan on running additional tests for this PR in the next few days, just for my own sanity check.

Copy link
Collaborator

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hernandezc1 hernandezc1 merged commit 5e5ad71 into main/elasticc Oct 31, 2023
1 check passed
@hernandezc1 hernandezc1 deleted the u/ch/startupscript branch October 31, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working ELAsTiCC Tracks development of our ELAsTiCC broker Pipeline: Conductor Components that orchestrate the live pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants