-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add notebook with batch embeddings generation & search #171
Add notebook with batch embeddings generation & search #171
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,868 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard setup for more custom BQ based operations [if to be added]
/gcbrun |
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded: In this tutorial you creates a similarity search on Stackoverflow questions to identify similar topics, questions and technologies being discussed. You leverage BigQuery and Dataproc Serverless for distributed prediction on Deep Learning models.
Also, can you link to product pages the first time each product is mentioned? Dataproc Serverless, BigQuery, Workbench etc.
Reply via ReviewNB
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded: In this tutorial, you use Apache Spark for batch inference/prediction and BigQuery for Vector Search. You run Apache Spark using Dataproc Interactive Sessions inside Vertex AI Workbench.
The example uses open source stackoverflow data and the open source Hugging Face model all-MiniLM-L12-v2 text embeddings. The model maps text data into 384 dimensional dense vector space. The similarity search on vector index is created in BigQuery.
The Hugging Face transformers library is installed by default in Dataproc Serverless runtime version 2.2+. See the full list of Python libraries in the runtime.
--
Please also add the link to the stackoverflow data
--
I think you can delete "This tutorial uses the following Google Cloud ML services and resources:" and below.
Reply via ReviewNB
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable autoscaling by setting the following parameters in Spark properties:
spark.dynamicAllocation.enabled = true spark.dynamicAllocation.maxExecutors = 100 spark.dynamicAllocation.minExecutors = 5
Reply via ReviewNB
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand on this. This isn't an easy topic for a user to understand so the more explanation the better.
Reply via ReviewNB
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,772 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvidhi I added my review. In general, I would add a lot more explanation throughout. This isn't the easiest topic for a user to grasp, so I would err towards overexplaining. |
No description provided.