Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Typesense #44

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Add Typesense #44

wants to merge 29 commits into from

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Sep 27, 2024

This PR that integrates Typesense into Hexdocs.

TODOs:

  • should Broadway task fail on failed indexing, or should it just log an error?
    Logging excessively for now
  • should indexing happen in the same step as upload to object storage or can it be parallel?
    Indexing in the same step for now
  • index proglang
  • index Erlang, what's its search_data equivalent?
    The format is the same: Add Typesense #44 (comment)
  • index Gleam: ce7af3b
    Commit reverted, approach needs discussion with Gleam team
  • more tests

CI results: ruslandoga#1

lib/hexdocs/search.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/search.ex Outdated Show resolved Hide resolved
@josevalim
Copy link
Member

Thank you @ruslandoga! I am currently on holidays but I will try to carve some time sooner than later to give you feedback. /cc @wojtekmach

@josevalim
Copy link
Member

index erlang, what's its search_data equivalent?

Erlang uses ExDoc now, so they have the exact same structure. However, we will need to either poll them or ask them to ping us once they publish a new version or ask them to push their docs to Hexdocs! I'd say we can postpone this to a follow up pull request.

On the other hand, I believe Gleam does not have the data in the format we need (we created our search_data.js somewhat recently thinking exactly about this). They would need to update their tools first, so we can postpone this conversation too.

@wojtekmach
Copy link
Member

@ruslandoga I'll try to review this sooner than later but I have a lot on my plate until October 15th, after that it's gonna be my number one priority to see this through. Sorry about that.

lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/search.ex Outdated Show resolved Hide resolved
lib/hexdocs/search.ex Outdated Show resolved Hide resolved
Copy link
Member

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exiting! I left some comments below.

test/test_helper.exs Show resolved Hide resolved
lib/hexdocs/search/typesense.ex Outdated Show resolved Hide resolved
lib/hexdocs/search/typesense.ex Outdated Show resolved Hide resolved
lib/hexdocs/search/typesense.ex Outdated Show resolved Hide resolved
lib/hexdocs/search/typesense.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
lib/hexdocs/queue.ex Outdated Show resolved Hide resolved
@ruslandoga ruslandoga mentioned this pull request Oct 21, 2024
@ruslandoga ruslandoga marked this pull request as ready for review November 10, 2024 13:57
@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 10, 2024

👋

I think it's ready for review now :)

Updates since the last review:

  • removed Gleam indexing (for now)
  • added proglang field to the schema
  • moved some code around in attempt to make it cleaner
  • added more tests

search_data_js =
Enum.find_value(files, fn {path, content} ->
case Path.basename(path) do
"search_data-" <> _digest -> content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot trust this data since it's user provided, can they do anything dangerous by providing something we don't expect? Maybe we should do some rudimentary validation?

Copy link
Contributor Author

@ruslandoga ruslandoga Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can provide long strings like https://github.com/cloudpods-dev/docker-engine-api-elixir/blob/813cc557da483f623a8f484db04efc7e58db0376/lib/docker_engine_api/api/container.ex#L67, but Typesense seems to handle it fine. We can check for content size, maybe. I think if Typesense doesn't like the payload, it would simply reject it.

Copy link
Contributor Author

@ruslandoga ruslandoga Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that checks that invalid fields in search items (like type being a map instead of a string, or doc being a list) are rejected: 8d58e4f

@@ -5,6 +5,9 @@ if config_env() == :prod do
port: System.fetch_env!("HEXDOCS_PORT"),
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"),
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"),
typesense_url: System.fetch_env!("TYPESENSE_URL"),
typesense_api_key: System.fetch_env!("TYPESENSE_API_KEY"),
typesense_collection: System.fetch_env!("TYPESENSE_COLLECTION"),
Copy link
Contributor Author

@ruslandoga ruslandoga Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created hexdocs-test collection on Typesense Cloud, https://cloud.typesense.org/clusters/ent97o5sv4dzx2f0p/collections

I think we can use it during alpha/beta testing.

@@ -5,6 +5,9 @@ if config_env() == :prod do
port: System.fetch_env!("HEXDOCS_PORT"),
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"),
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"),
typesense_url: System.fetch_env!("TYPESENSE_URL"),
typesense_api_key: System.fetch_env!("TYPESENSE_API_KEY"),
Copy link
Contributor Author

@ruslandoga ruslandoga Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the "Admin API Key" for https://cloud.typesense.org/clusters/ent97o5sv4dzx2f0p cluster, it can be downloaded from the dashboard.

@@ -5,6 +5,9 @@ if config_env() == :prod do
port: System.fetch_env!("HEXDOCS_PORT"),
hexpm_url: System.fetch_env!("HEXDOCS_HEXPM_URL"),
hexpm_secret: System.fetch_env!("HEXDOCS_HEXPM_SECRET"),
typesense_url: System.fetch_env!("TYPESENSE_URL"),
Copy link
Contributor Author

@ruslandoga ruslandoga Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this would be https://ent97o5sv4dzx2f0p.a1.typesense.net

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 18, 2024

👋

Just wanted to check if there’s anything else we'd need to address before it's merged?

Copy link
Member

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We plan to test this out in our staging server before merging. Thank you for all the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants