Add an option to not use elastic agents for meta-reference inference #269

ashwinb · 2024-10-18T19:26:57Z

Sometimes llama-stack is used as a library (and not as a server) -- in some of these cases, the client code can wish to take control of setting up the distributed process group. This can happen in test cases (run via torchrun) or in eval harnesses. This PR adds an option to make this possible.

Test Plan

MODEL_IDS=Llama3.2-3B-Instruct \
  PROVIDER_ID=meta-reference \
  PROVIDER_CONFIG=~/.llama/tests/inf_providers.yaml \
  torchrun ~/.conda/envs/quant/bin/pytest -s llama_stack/providers/tests/inference/test_inference.py --tb=short --disable-warnings

with the following config for meta-reference provider:

providers:
  - provider_id: meta-reference
    provider_type: meta-reference
    config:
      model: Llama3.2-3B-Instruct
      create_distributed_process_group: false

…269)

* docker compose ollama * comment * update compose file * readme for distributions * readme * move distribution folders * move distribution/templates to distributions/ * rename * kill distribution/templates * readme * readme * build/developer cookbook/new api provider * developer cookbook * readme * readme * [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264) * fix case where memory bank is registered without provider_id * memory test * agents unit test * Add an option to not use elastic agents for meta-reference inference (#269) * Allow overridding checkpoint_dir via config * Small rename * Make all methods `async def` again; add completion() for meta-reference (#270) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :) * Improve an important error message * update ollama for llama-guard3 * Add vLLM inference provider for OpenAI compatible vLLM server (#178) This PR adds vLLM inference provider for OpenAI compatible vLLM server. * Create .readthedocs.yaml Trying out readthedocs * Update event_logger.py (#275) spelling error * vllm * build templates * delete templates * tmp add back build to avoid merge conflicts * vllm * vllm --------- Co-authored-by: Ashwin Bharambe <[email protected]> Co-authored-by: Ashwin Bharambe <[email protected]> Co-authored-by: Yuan Tang <[email protected]> Co-authored-by: raghotham <[email protected]> Co-authored-by: nehal-a2z <[email protected]>

Add an option to not use elastic agents for meta-reference inference

bb3c26c

ashwinb requested review from yanxi0830, hardikjshah, dltn and raghotham as code owners October 18, 2024 19:26

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 18, 2024

Rename config var

4bda515

raghotham approved these changes Oct 18, 2024

View reviewed changes

ashwinb merged commit 33afd34 into main Oct 18, 2024
4 checks passed

raghotham deleted the metaref-inference-plus branch October 20, 2024 07:25

yanxi0830 pushed a commit that referenced this pull request Oct 21, 2024

Add an option to not use elastic agents for meta-reference inference (#…

a90ab58

…269)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to not use elastic agents for meta-reference inference #269

Add an option to not use elastic agents for meta-reference inference #269

ashwinb commented Oct 18, 2024 •

edited

Loading

Add an option to not use elastic agents for meta-reference inference #269

Add an option to not use elastic agents for meta-reference inference #269

Conversation

ashwinb commented Oct 18, 2024 • edited Loading

ashwinb commented Oct 18, 2024 •

edited

Loading