Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add PostgresVectorStore Memory connector. #9324

Open
wants to merge 73 commits into
base: main
Choose a base branch
from

Conversation

lossyrob
Copy link
Contributor

@lossyrob lossyrob commented Oct 18, 2024

This PR adds a PostgresVectorStore and related classes to Microsoft.SemanticKernel.Connectors.Postgres.

Motivation and Context

As part of the move to having memory connectors implement the new Microsoft.Extensions.VectorData.IVectorStore architecture (see https://github.com/microsoft/semantic-kernel/blob/main/docs/decisions/0050-updated-vector-store-design.md), each memory connector needs to be updated with the new architecture. This PR tackles updating the existing Microsoft.SemanticKernel.Connectors.Postgres package to include this implementation. This will supercede the PostgresMemoryStore implementation.

Some high level comments about design:

  • PostgresVectorStore and PostgresVectorStoreRecordCollection get injected with an IPostgresVectorStoreDbClient. This abstracts the database communication and allows for unit tests to mock database interactions.
  • The PostgresVectorStoreDbClient gets passed in a NpgsqlDataSource from the user, which is used to manage connections to the database. The responsibility of connection pool lifecycle management is on the user.
  • The IPostgresVectorStoreDbClient is designed to accept and produce the storage model, which in this case is a Dictionary<string, object?> . This is the intermediate type that is mapped to by the IVectorStoreRecordMapper.
  • The PostgresVectorStoreDbClient also takes a IPostgresVectorStoreCollectionSqlBuilder, which generates SQL command information for interacting with the database. This abstracts the SQL queries related to each task, and allows for future expansion. This is particularly targeted at creating a AzureDBForPostgre vector store that will enable alternate vector implementations like DiskANN, while leveraging the same database client as the Postgres connector.
  •  The integration tests for the vector store utilize Docker.Net to bring up a pgvector/pgvector docker container, which test are run against.

Contribution Checklist

Work in progress, some methods are not implemented yet.
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels Oct 18, 2024
new VectorStoreRecordDataProperty("code", typeof(int)),
new VectorStoreRecordDataProperty("rating", typeof(float?)),
new VectorStoreRecordDataProperty("description", typeof(string)),
new VectorStoreRecordDataProperty("parking_is_included", typeof(bool)),
Copy link
Contributor

@westey-m westey-m Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth having a record definition where some of the properties have storage names that are different to the main property name, to make sure these are used correctly in the command. Same with the other tests here that have to build names into the command.

{
return new VectorStoreRecordDefinition
{
Properties = new List<VectorStoreRecordProperty>
Copy link
Contributor

@westey-m westey-m Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add one or two enumerables to the property list as well to verify that they word as expected.

{
Properties = new List<VectorStoreRecordProperty>
{
new VectorStoreRecordKeyProperty("HotelId", typeof(ulong)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this anywhere? From the constants list I don't believe we support ulong, so I would expect using this to fail.

[InlineData(typeof(long), 7L)]
[InlineData(typeof(string), "key1")]
[InlineData(typeof(Guid), null)]
public async Task ItCanGetAndDeleteRecordAsync(Type idType, object? key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also possible to make test methods generic, e.g. public async Task ItCanGetAndDeleteRecordAsync<TKey>(Type idType, TKey? key), which means you can use TKey in the method and you don't have to use dynamic.

namespace SemanticKernel.IntegrationTests.Connectors.Memory.Postgres;

[Collection("PostgresVectorStoreCollection")]
public sealed class PostgresVectorStoreRecordCollectionTests(PostgresVectorStoreFixture fixture)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test that reads some data that wasn't upserted by the Collection. E.g. if you create a table and upsert some records using SQL, and then read them using the Collection. The reason being that we want to ensure that the data in the DB is actually what we want it to be. E.g. if we do two complimentary things wrong, on upsert and read, so it looks fine when using the Collection, but the data isn't stored in the way we need it to in the DB, a test where we didn't write the data via the Collection should catch this.

@lossyrob
Copy link
Contributor Author

lossyrob commented Nov 1, 2024

@westey-m most recent comments addressed, thanks!

@Hanake0
Copy link

Hanake0 commented Nov 26, 2024

Hi @westey-m and @lossyrob,

Thanks for the work on this feature.
Sorry for the inconvenience if that is not the right place to ask, but do you guys know when will it be available in the Microsoft.SemanticKernel.Connectors.Postgres package?

We are currently using the legacy Memory Store but that was recently marked as legacy and the connector for Postgres is currently marked as in-development, but as i see here it is probably at-least ready for alpha testing.

Is it available for testing already in any channel or is any ETA available for it?

Again, sorry if that is not the right place to ask.

Thanks!

@westey-m
Copy link
Contributor

westey-m commented Nov 26, 2024

@Hanake0, no problem. It's been a bit busy lately with various deadlines, but I was meaning to get back to this again this week. With the start of the holiday period, giving an ETA will be difficult, but we'll certainly be progressing it as a priority.

Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small comments, but in general looks good to me! Thanks for your contribution!

this._postgresContainerId = await VectorStoreInfra.SetupPostgresContainerAsync(this._dockerClient);

// Delay until the Postgres server is ready.
var connectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be better to get this connection string from .NET user secrets to avoid a pattern of keeping connection strings directly in a code, even if it's a test one. I think it's okay to keep it in the code if connection string is just a URL, but in this case, it also contains a password.

Comment on lines +28 to +31
/// <summary>
/// The connection string to the Postgres database hosted in the docker container.
/// </summary>
private const string ConnectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, we should not encourage users to keep the connection strings in a codebase, even if it's a test settings. We can do it in similar way how it's implemented in Azure AI Search, as an example:

/// To set your secrets use:
/// <para> dotnet user-secrets set "AzureAISearch:Endpoint" "https://... .search.windows.net"</para>
/// <para> dotnet user-secrets set "AzureAISearch:ApiKey" "{Key from your Search service resource}"</para>
/// </summary>

kernelBuilder.AddAzureAISearchVectorStore(
new Uri(TestConfiguration.AzureAISearch.Endpoint),
new AzureKeyCredential(TestConfiguration.AzureAISearch.ApiKey));

Comment on lines +14 to 19
/// <remarks>
/// This interface is used with the PostgresMemoryStore, which is being deprecated.
/// Use the <see cref="IPostgresVectorStoreDbClient"/> interface with the PostgresVectorStore
/// and related classes instead.
/// </remarks>
public interface IPostgresDbClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPostgresVectorStoreDbClient is internal, but this interface and documentation is public. Instead of this XML documentation we can mark this interface as Obsolete and recommend using new PostgresVectorStore.

@@ -15,6 +15,11 @@ namespace Microsoft.SemanticKernel.Connectors.Postgres;
/// <summary>
/// An implementation of a client for Postgres. This class is used to managing postgres database operations.
/// </summary>
/// <remarks>
/// This class is used with the PostgresMemoryStore, which is being deprecated.
/// Use the <see cref="PostgresVectorStoreDbClient"/> class with the PostgresVectorStore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, PostgresVectorStoreDbClient is internal. PostgresVectorStore and/or PostgresVectorStoreRecordCollection can be used as a new reference points.

this._propertyReader.VerifyDataProperties(PostgresConstants.SupportedDataTypes, PostgresConstants.SupportedEnumerableDataElementTypes);
this._propertyReader.VerifyVectorProperties(PostgresConstants.SupportedVectorTypes);
}
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel)
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel)

/// function on each element of the original sequence.
/// </returns>
/// <exception cref="ArgumentNullException">Thrown when the source or selector is null.</exception>
public static async IAsyncEnumerable<TResult> Select<TSource, TResult>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't name this method as SelectAsync?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation kernel Issues or pull requests impacting the core kernel memory .NET Issue or Pull requests regarding .NET code
Projects
Status: Community PRs
Development

Successfully merging this pull request may close these issues.

6 participants