Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spacy::Language.new hangs when called from sidekiq worker #6

Open
hoblin opened this issue Dec 29, 2023 · 5 comments
Open

Spacy::Language.new hangs when called from sidekiq worker #6

hoblin opened this issue Dec 29, 2023 · 5 comments

Comments

@hoblin
Copy link

hoblin commented Dec 29, 2023

I built a little tokenizer in my app and it works great when called from the rails console. But once I run it from a background job (I tried both sidekiq and solid_queue) or from a server process (puma) with after_create_commit callback, it just hangs the process and I need to kill it.
It looks similar to this issue mrkn/pycall.rb#95

@omartorresrios
Copy link

omartorresrios commented Feb 21, 2024

Hi @hoblin . I've been struggling with this too for the last week and I finally found a workaround. Basically I stopped using this gem since as you say, it hangs the process.
This is the line that brings all problems:

PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")

Calling exec seems a bit risky.

I ended up using another approach when running python scripts from ruby. From your controller you can run:
some_value = python3 lib/python/filename.py "#{param1}" "#{param2}". And in the py file you can import the "en_core_web_md" model -> nlp = spacy.load("en_core_web_md"). Hope that helps!

@hoblin
Copy link
Author

hoblin commented Feb 21, 2024

@omartorresrios Thanks mate. I actually got rid of everything spacy related in my app and now i just have these four lines in my docker-compose

  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    ports:
      - '8000:8000'

It's a little lightweight image that i forked from the old outdated (as anything in NLP realm) public image to make a multi-platform build for my kamal-based deploy. So now i just have a little lighting-fast API for tokenization.

@adrienpoly
Copy link

@hoblin I am trying to deploy an app with Kamal and a wrapper to Spacy. I tried using your docker image but I must be missing something as when I call Spacy::Language.new("en_core_web_md") it doesn't find the model.
Would you mind sharing a bit more about your solution ?
Thanks

@hoblin
Copy link
Author

hoblin commented Mar 20, 2024

@adrienpoly Sure. I use it as a micro-service and call it via API.

docker-compose.yml:

version: '3'

services:
  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    ports:
      - '8000:8000'

config/deploy.yml(Kamal):

servers:
  web:
      ...
    options:
      network: networkname
  jobs:
    ...
    options:
      network: networkname
accessories:
  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    port: 8000
    roles:
      - web
      - jobs
    options:
      network: networkname

lib/nlp/spacy.rb:

require "httparty"

module Nlp
  class Spacy
    HOST = Rails.env.development? ? "http://localhost:8000" : "http://projectname-spacyapi:8000"

    class << self
      def named_entities(text)
        HTTParty.post(
          "#{HOST}/ner",
          body: {sections: [text]}.to_json,
          headers: {"Content-Type" => "application/json"}
        ).dig("data", 0, "entities")
      end

      def tokens(text)
        HTTParty.post(
          "#{HOST}/pos",
          body: {text: text}.to_json,
          headers: {"Content-Type" => "application/json"}
        ).dig("data", 0, "tags")
      end
    end
  end
end

The image itself is a fork of https://github.com/neelkamath/spacy-server and the only difference from original is the MULTI-PLATFORM build (amd+arm).
https://hub.docker.com/layers/hoblin/spacy-server/2-en_core_web_md/images/sha256-6c75186013463efca502b8621456fee8af0c3a75c2ea14e223674df9627a3327?context=repo

You can look for available endpoints here https://github.com/neelkamath/spacy-server/blob/master/src/main.py#L62

@yohasebe
Copy link
Owner

Apologies for the delayed response!

I've addressed this issue by adding a timeout feature to Spacy::Language.new.

Please check out the updated README for details and give the latest version a try. Hopefully, this will resolve the hanging problem. If you're still running into issues, @hoblin's microservice approach could be a great alternative to explore.

Thanks again for bringing this to attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants