Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to not use elastic agents for meta-reference inference #269

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

ashwinb
Copy link
Contributor

@ashwinb ashwinb commented Oct 18, 2024

Sometimes llama-stack is used as a library (and not as a server) -- in some of these cases, the client code can wish to take control of setting up the distributed process group. This can happen in test cases (run via torchrun) or in eval harnesses. This PR adds an option to make this possible.

Test Plan

MODEL_IDS=Llama3.2-3B-Instruct \
  PROVIDER_ID=meta-reference \
  PROVIDER_CONFIG=~/.llama/tests/inf_providers.yaml \
  torchrun ~/.conda/envs/quant/bin/pytest -s llama_stack/providers/tests/inference/test_inference.py --tb=short --disable-warnings

with the following config for meta-reference provider:

providers:
  - provider_id: meta-reference
    provider_type: meta-reference
    config:
      model: Llama3.2-3B-Instruct
      create_distributed_process_group: false

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 18, 2024
@ashwinb ashwinb merged commit 33afd34 into main Oct 18, 2024
4 checks passed
@raghotham raghotham deleted the metaref-inference-plus branch October 20, 2024 07:25
yanxi0830 added a commit that referenced this pull request Oct 21, 2024
* docker compose ollama

* comment

* update compose file

* readme for distributions

* readme

* move distribution folders

* move distribution/templates to distributions/

* rename

* kill distribution/templates

* readme

* readme

* build/developer cookbook/new api provider

* developer cookbook

* readme

* readme

* [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)

* fix case where memory bank is registered without provider_id

* memory test

* agents unit test

* Add an option to not use elastic agents for meta-reference inference (#269)

* Allow overridding checkpoint_dir via config

* Small rename

* Make all methods `async def` again; add completion() for meta-reference (#270)

PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)

* Improve an important error message

* update ollama for llama-guard3

* Add vLLM inference provider for OpenAI compatible vLLM server (#178)

This PR adds vLLM inference provider for OpenAI compatible vLLM server.

* Create .readthedocs.yaml

Trying out readthedocs

* Update event_logger.py (#275)

spelling error

* vllm

* build templates

* delete templates

* tmp add back build to avoid merge conflicts

* vllm

* vllm

---------

Co-authored-by: Ashwin Bharambe <[email protected]>
Co-authored-by: Ashwin Bharambe <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: raghotham <[email protected]>
Co-authored-by: nehal-a2z <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants