Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

adding exp backoff retry decorator to openai embedding and completion calls #12

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cfortuner
Copy link
Owner

Adds a new utility -> Retry!

  • Updated Openai provider's embedOne, generate and stream methods to use Retry!

It's a typescript decorator that lets you retry N number of times:

  @retry(3)
  async generate(
    promptText: string,
    options: GenerateCompletionOptions = DEFAULT_COMPLETION_OPTIONS
  ) {
    try {
      if (options.s

Let me know what you think!

@vercel
Copy link

vercel bot commented Feb 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-promptable ❌ Failed (Inspect) Feb 14, 2023 at 7:26PM (UTC)

@cfortuner cfortuner mentioned this pull request Feb 14, 2023
Copy link
Contributor

@mathisobadia mathisobadia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make some comments to make clear what I already said on discord


private embedMany = async (
@retry(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line as it would retry the whole batch

Comment on lines -265 to -268
this.api.createEmbedding({
...options,
input: text.replace(/\n/g, " "),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace this to a call to this.embedOne, since embedOne has the retry decorator, each function call should be retried in case of failure instead of the whole batch

Copy link
Contributor

@mathisobadia mathisobadia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another comment on checking if the error is throttling before retrying. You don't want to wait for up to 10 seconds before realizing that the request if malformed or that you api key is wrong

logger.log(chalk.red(`Maximum retries exceeded`));
throw error; // re-throw error if maximum retries exceeded
} else {
logger.log(chalk.yellow(`Retrying...`));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might want to only retry if the error is that there is throttling of the request. This would make this function less abstract as we have to assume that the error is an axios result and do a check that looks like

if (error.response.status === 429)

So this would only work for axios errors but I think it makes sense for now as this is only used with openai client that uses axios under the hood.

@yourbuddyconner
Copy link
Contributor

Another option for retry logic here that is probably more performant / better supported is the async-retry library.

Here's an example implementation from my codebase, wrapping the openai SDK:

import retry from "async-retry";

export const openaiCompletion = trace("openaiCompletion", _openaiCompletion);
async function _openaiCompletion(prompt: string, model: string = "text-davinci-003", temperature: number = 1, nTokens: number = 500): Promise<string> {
    const response = await retry(
        async (bail) => {
            return openai.createCompletion({
                model: model,
                prompt,
                temperature: temperature,
                max_tokens: nTokens,
                top_p: 1,
                frequency_penalty: 0,
                presence_penalty: 0
            })
        },
        {
            retries: 8,
            factor: 4,
            minTimeout: 1000,
            // onRetry: (error: any) => console.log(error)
        }
    )
    const text = response.data.choices[0].text
    return text!
}

Thoughts:

  • Tracing this with promptable is useful, as an aside.
  • Lets you pass an arbitrary error handler callback (not wired up here)
  • Retry logic (ex. retries and factor) can be parameterized and presented to the user to turn the knobs
  • Not a lot of overhead to adding, just an extra import.

I would probably opt to add this at the base ModelProvider level, and I think it's worth considering implementing this as a function decorator s.t. the retry logic can be added without too much additional boilerplate. That said, thinking deeply about retry logic on an api-specific basis is a real good idea because what is good for OpenAI APIs might not hold for other service providers.

@cfortuner
Copy link
Owner Author

Just updating here. Holding off on adding this for now,

we have some other ideas that we'd like to try that might be better.

@yourbuddyconner
Copy link
Contributor

Cool @cfortuner lmk if you want me to review the solution when you have a PR.

@ymansurozer
Copy link

Having read @mathisobadia's comments, I think that makes more sense. There is a trade-off here:

  • In batching, we save on request counts, so less requests toward requests per minute rate limit.
  • In one by one, we save time in processing because we do not lose time when a batch is rejected, but this means there are a lot of requests toward requests per minute rate limit.

One more thing to consider is how to handle embedding requests that take more than 250k tokens per minute. If we do batching, we have to construct the array to remain below that. But in one by one, as one embedding request cannot exceed model max token length, we are safe. Even if the total of those exceed 250k/m, the retry policies would handle that.

So I'd say reducing the time needed to process embeddings and being able to handle large text processing is more important (at least for my case). But maybe we can have processing policies to handle both.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants