Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Cleanup operation on async Scorers runs before scorers finish executing #433

Open
mongodben opened this issue Nov 5, 2024 · 2 comments

Comments

@mongodben
Copy link
Contributor

I'm running an eval that has a scorer function that includes a MongoDB operation.

After the eval runs, I need to close the database connection for the process to exit.

However, the process is exiting before the clean up operation. see annotated code w/ this problem below:

relevant code with clean up error
import { Eval, EvalCase, EvalScorer } from "braintrust";
import { MongoDbTag } from "./mongoDbMetadata";
import {
  findVerifiedAnswer,
  verifiedAnswerConfig,
  verifiedAnswerStore,
} from "./config";
import { FindVerifiedAnswerResult } from "mongodb-chatbot-server";
import {
  parseVerifiedAnswerYaml,
  VerifiedAnswerSpec,
} from "mongodb-chatbot-verified-answers";
import path from "path";
import fs from "fs";
import "dotenv/config";
import { cosineSimilarity } from "./test/cosineSimilarity";
import { strict as assert } from "assert";

interface VerifiedAnswersEvalCaseInput {
  query: string;
}
interface VerifiedAnswersEvalCaseExpected {
  /**
    The expected verified answer to the query.
    If `undefined`, expects no verified answer.
   */
  answer?: string;
}

interface VerifiedAnswersEvalCaseMetadata extends Record<string, unknown> {
  similarVerifiedAnswerQuery?: string;
  description?: string;
}

type VerifiedAnswerTag = "perturbation" | "should_match" | "should_not_match";

type MongoDbVerifiedAnswerTag = MongoDbTag | VerifiedAnswerTag;

interface VerifiedAnswersEvalCase
  extends EvalCase<
    VerifiedAnswersEvalCaseInput,
    VerifiedAnswersEvalCaseExpected,
    VerifiedAnswersEvalCaseMetadata
  > {
  tags?: MongoDbVerifiedAnswerTag[];
}

type VerifiedAnswersTaskOutput = FindVerifiedAnswerResult;

type VerifiedAnswersEvalCaseScorer = EvalScorer<
  VerifiedAnswersEvalCaseInput,
  VerifiedAnswersTaskOutput,
  VerifiedAnswersEvalCaseExpected,
  VerifiedAnswersEvalCaseMetadata
>;

const verifiedAnswersPath = path.resolve(
  __dirname,
  "..",
  "..",
  "..",
  "verified-answers.yaml"
);
const verifiedAnswerSpecs = parseVerifiedAnswerYaml(
  fs.readFileSync(verifiedAnswersPath, "utf-8")
);
const verifiedAnswerIndex = makeVerifiedAnswerIndex(verifiedAnswerSpecs);

const verifiedAnswerEvalCases: VerifiedAnswersEvalCase[] = [
  makeVerifiedAnswerEvalCase({
    inputQuery: "what is the aggregation framework",
    similarVerifiedAnswerQuery: "What is aggregation in MongoDB?",
    tags: ["aggregation", "perturbation"],
    verifiedAnswerIndex,
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "agg framework",
    similarVerifiedAnswerQuery: "What is aggregation in MongoDB?",
    tags: ["aggregation", "perturbation", "should_match"],
    verifiedAnswerIndex,
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "what's the process to insert data into MongoDB",
    similarVerifiedAnswerQuery: "How do I insert data into MongoDB?",
    verifiedAnswerIndex,
    tags: ["perturbation", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "How can I insert data into MongoDB?",
    similarVerifiedAnswerQuery: "How do I insert data into MongoDB?",
    verifiedAnswerIndex,
    tags: ["perturbation", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "insert data into mongodb",
    similarVerifiedAnswerQuery: "How do I insert data into MongoDB?",
    verifiedAnswerIndex,
    tags: ["perturbation", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "password reset",
    similarVerifiedAnswerQuery: "Can i reset my password",
    verifiedAnswerIndex,
    tags: ["perturbation", "iam", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "reset my password",
    similarVerifiedAnswerQuery: "Can i reset my password",
    verifiedAnswerIndex,
    tags: ["perturbation", "iam", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "reset database password",
    similarVerifiedAnswerQuery: "Can i reset my password",
    verifiedAnswerIndex,
    tags: ["perturbation", "iam", "should_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "connect to stream process",
    verifiedAnswerIndex,
    tags: ["atlas_stream_processing", "should_not_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "connect to database kotlin",
    verifiedAnswerIndex,
    tags: ["driver", "kotlin", "should_not_match"],
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "connect to database with Kotlin coroutine driver",
    verifiedAnswerIndex,
    tags: ["driver", "kotlin", "kotlin_coroutine_driver", "should_not_match"],
  }),
  // 👇 From EAI-580 👇
  makeVerifiedAnswerEvalCase({
    inputQuery: "how do I set up billing alerts in Atlas",
    // No similar verified answer
    tags: ["billing", "should_not_match"],
    verifiedAnswerIndex,
  }),
  makeVerifiedAnswerEvalCase({
    inputQuery: "how do I set up billing alerts in Atlas?",
    // No similar verified answer
    tags: ["billing", "should_not_match"],
    verifiedAnswerIndex,
  }),
];

// Helper function to create a verified answer eval case
function makeVerifiedAnswerEvalCase(args: {
  inputQuery: string;
  similarVerifiedAnswerQuery?: string;
  description?: string;
  tags?: MongoDbVerifiedAnswerTag[];
  verifiedAnswerIndex: VerifiedAnswerIndex;
}): VerifiedAnswersEvalCase {
  return {
    input: {
      query: args.inputQuery,
    },
    expected: {
      answer: args.similarVerifiedAnswerQuery
        ? findExactVerifiedAnswer(
            args.similarVerifiedAnswerQuery,
            args.verifiedAnswerIndex
          )
        : undefined,
    },
    tags: args.tags,
    metadata: {
      similarVerifiedAnswerQuery: args.similarVerifiedAnswerQuery,
      description: args.description,
    },
  };
}

// -- Evaluation metrics --
const MatchesSomeVerifiedAnswer: VerifiedAnswersEvalCaseScorer = (args) => {
  return {
    name: "MatchesSomeVerifiedAnswer",
    score: args.output.answer ? 1 : 0,
  };
};

const MatchesExpectedOutput: VerifiedAnswersEvalCaseScorer = (args) => {
  const isMatch = args.output.answer?.answer === args.expected.answer;
  let matchType = "";
  if (isMatch && args.expected.answer !== undefined) {
    matchType = "true_positive";
  } else if (isMatch && args.expected.answer === undefined) {
    matchType = "true_negative";
  } else if (!isMatch && args.expected.answer !== undefined) {
    matchType = "false_positive";
  } else if (!isMatch && args.expected.answer === undefined) {
    matchType = "false_negative";
  }

  return {
    name: "MatchesExpectedOutput",
    score: isMatch ? 1 : 0,
    metadata: {
      type: matchType,
    },
  };
};

const SearchScore: VerifiedAnswersEvalCaseScorer = (args) => {
  return {
    name: "SearchScore",
    score: args.output.answer?.score ?? null,
  };
};

// BUG: Getting Mongo connection closed errors on this scorer with the clean up.
const ReferenceAnswerCosineSimilarity: VerifiedAnswersEvalCaseScorer = async (
  args
) => {
  const name = "ReferenceAnswerCosineSimilarity";
  const { similarVerifiedAnswerQuery } = args.metadata;

  if (!similarVerifiedAnswerQuery) {
    return {
      name,
      score: null,
    };
  }
  const [verifiedAnswer] = await verifiedAnswerStore.find({
    "question.text": similarVerifiedAnswerQuery,
  });
  assert(
    verifiedAnswer,
    `No verified answer found for query: ${similarVerifiedAnswerQuery}`
  );
  return {
    name,
    score: cosineSimilarity(
      args.output.queryEmbedding,
      verifiedAnswer.question.embedding
    ),
  };
};

type VerifiedAnswerIndex = Record<string, string>;
/**
  Construct index of all verified answer for faster look up
 */
function makeVerifiedAnswerIndex(
  verifiedAnswerSpecs: VerifiedAnswerSpec[]
): VerifiedAnswerIndex {
  const verifiedAnswerIndex: VerifiedAnswerIndex = {};
  for (const { questions, answer } of verifiedAnswerSpecs) {
    questions.forEach((question) => {
      verifiedAnswerIndex[question] = answer;
    });
  }
  return verifiedAnswerIndex;
}

function findExactVerifiedAnswer(
  query: string,
  verifiedAnswerIndex: VerifiedAnswerIndex
): string | undefined {
  return verifiedAnswerIndex[query];
}

async function main() {
  await Eval<
    VerifiedAnswersEvalCaseInput,
    VerifiedAnswersTaskOutput,
    VerifiedAnswersEvalCaseExpected,
    VerifiedAnswersEvalCaseMetadata
  >("mongodb-chatbot-verified-answers", {
    experimentName: `mongodb-chatbot-latest-${verifiedAnswerConfig.embeddingModel}-minScore-${verifiedAnswerConfig.findNearestNeighborsOptions.minScore}`,
    metadata: {
      description:
        "Evaluates if gets the correct verified answers for a given query",
      verifiedAnswerConfig: verifiedAnswerConfig,
    },
    async data() {
      return verifiedAnswerEvalCases;
    },
    maxConcurrency: 5,
    async task(input) {
      const verifiedAnswer = await findVerifiedAnswer(input);
      return verifiedAnswer;
    },
    scores: [
      MatchesSomeVerifiedAnswer,
      MatchesExpectedOutput,
      ReferenceAnswerCosineSimilarity,
      SearchScore,
    ],
  });
  // BUG: The store registers as closed before the eval scorers finish.

  await verifiedAnswerStore.close();
}
main();

error in terminal:

Evaluator mongodb-chatbot-verified-answers [experimentName=mongodb-chatbot-latest-docs-chatbot-embedding-ada-002-minScore-0.96] failed with 13 errors. This evaluation ("mongodb-chatbot-verified-answers [experimentName=mongodb-chatbot-latest-docs-chatbot-embedding-ada-002-minScore-0.96]") will not be fully logged.
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
MongoNotConnectedError: Client must be connected before running operations
Add --verbose to see full stack traces.

i'd guess there's something funky with how the Eval.scorers handles scorers with promises. if there's no simple patch to support promises running after the scorers outside of the Eval, maybe you could add a new Eval.afterAll() hook where one can run clean up logic like this.

@ankrgyl
Copy link
Contributor

ankrgyl commented Nov 5, 2024

Sorry I'm not following. Are you saying that a promise is running after await Eval completes? I.e. verifiedAnswerStore.close() is running before Eval completes?

@mongodben
Copy link
Contributor Author

Are you saying that a promise is running after await Eval completes? I.e. verifiedAnswerStore.close() is running before Eval completes?

yes, exactly. verifiedAnswerStore.close() runs before the scorer that uses the verifiedAnswerStore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants