The Data Mesh Manager SDK is a Java library that provides a set of APIs to interact with Data Mesh Manager and Data Contract Manager.
Using the SDK, you can build Java applications to automate data platform operations, such as:
- Synchronize data products and data assets from the data platform to the Data Mesh Manager
- Synchronize datacontract.yaml in Git repositories with Data Contract Manager
- Automate permissions in the data platform when an access request has been approved
- Notify downstream consumers when data contract tests have failed
- Publish data product costs and usage data to Data Mesh Manager
This SDK is designed as a foundation for building data platform integrations that run as long-running agents on customer's data platform, e.g., as containers running in a Kubernetes cluster or any other container-runtime.
It interacts with the Data Mesh Manager APIs to send metadata and to subscribe to events to trigger actions in the data platform or with other services.
We provide some agents for commonly-used platforms that that use this SDK and that can be used out-of-the-box or as a template for custom integrations:
Platform | Integration | Synchronize Assets | Access Management | Remarks |
---|---|---|---|---|
Databricks | datamesh-manager-agent-databricks | ✅ | ✅ | Uses Unity Catalog APIs |
Snowflake | datamesh-manager-agent-snowflake | ✅ | ✅ | Uses the Snowflake REST API |
AWS | Coming soon | |||
Google Cloud Platform | datamesh-manager-agent-gcp | ✅ | ✅ | Uses BigQuery APIs |
Azure | Coming soon | |||
datahub | Coming soon | |||
Collibra | Coming soon |
If you are interested in further integration, please contact us.
Follow this guide to build your own custom integration.
- Java 17 or later
Add this dependency to your pom.xml
:
<dependency>
<groupId>com.datamesh-manager</groupId>
<artifactId>datamesh-manager-sdk</artifactId>
<version>RELEASE</version>
</dependency>
Replace the RELEASE
with the latest version of the SDK.
To work with the API, you need an API key.
Then you can instantiate a DataMeshManagerClient
:
var client = new DataMeshManagerClient(
"https://api.datamesh-manager.com",
"dmm_live_..."
);
This client has all methods to interact with the Data Mesh Manager API.
To synchronize assets (such as tables, views, files, topics, ...) from your data platform with Data Mesh Manager, implement the DataMeshManagerAssetsProvider
interface:
public class MyAssetsProvider implements DataMeshManagerAssetsProvider {
@Override
public void fetchAssets(AssetCallback assetCallback) {
// query your data platform for assets
// convert them to datameshmanager.sdk.client.model.Asset objects
// and call assetCallback.onAssetUpdated(asset) for each new or updated asset
}
}
With this implementation, you can start an DataMeshManagerAssetsSynchronizer
:
var agentid = "my-unique-assets-synchronization-agent-id";
var assetsProvider = new MyAssetsProvider();
var assetsSynchronizer = new DataMeshManagerAssetsSynchronizer(agentid, client, assetsSupplier);
assetsSynchronizer.start(); // This will start a long-running agent that calls the fetchAssets method periodically
To trigger actions in your data platform when events happen in Data Mesh Manager, you can implement the DataMeshManagerEventListener
interface:
public class MyEventHandler implements DataMeshManagerEventHandler {
@Override
public void onAccessActivatedEvent(AccessActivatedEvent event) {
// TODO grant permissions in your data platform
// use the DataMeshManagerClient to retrieve the current access resource and data product and consumer resource for details
}
@Override
public void onAccessDeactivatedEvent(AccessDeactivatedEvent event) {
// TODO revoke permissions in your data platform
}
}
You can listen to any event from Data Mesh Manager. The SDK provides a method for each event type.
With this implementation, you can start an DataMeshManagerEventListener
:
var agentid = "my-unique-event-listener-agent-id";
var eventHandler = new MyEventHandler();
var stateRepository = ... // see below
var eventListener = new DataMeshManagerEventListener(agentid, client, eventHandler, stateRepository);
eventListener.start(); // This will start a long-running agent that listens to events from Data Mesh Manager
If you have multiple agents in an application, make sure to start the start()
methods in separate threads.
The DataMeshManagerEventListener
requires a DataMeshManagerStateRepository
to store the lastEventId
that has been processed.
Also, you can use the state repository in other agents, if you need to store information what has been processed or what is the current state of your agent.
You can implement this interface to store the state in a database, a file, or any other storage:
public interface DataMeshManagerStateRepository {
Map<String, Object> getState();
void saveState(Map<String, Object> state);
}
For your convenience, you can use the DataMeshManagerStateRepositoryRemote
to store the state directly in the Data Mesh Manager:
var agentId = "my-unique-event-listener-agent-id";
var stateRepository = new DataMeshManagerStateRepositoryRemote(agentId, client);
and for testing there is also a DataMeshManagerStateRepositoryInMemory
.
Contributions are welcome! Please open an issue or a pull request.