======================================
Online Marketplace is a benchmark modeling an event-driven microservice system in the marketplace application domain. It is design to reflect emerging data management requirements and challenges faced by microservice developers in practice. This project contains the benchmark driver for Online Marketplace. The driver is responsible to manage the lifecycle of an experiment, including data generation, data population, workload submission, and metrics collection.
- .NET Framework 7
- IDE (if you want to modify or debug the code): Visual Studio or VSCode
- A multi-core machine with appropriate memory size in case generated data is kept in memory
- Linux- or MacOS-based operating system
Online Marketplace models the workload of an online marketplace platform. Experiencing a growing popularity, such platforms offer an e-commerce technology infrastructure so multiple retailers can offer their products or services to a large consumer base.
The driver requires some HTTP APIs to be exposed in order to setup the target data platform prior to workload submission and to be able to actually submit transaction requests.
API | HTTP Request Type | Miroservice | Description |
---|---|---|---|
/cart/{customerId}/add | PUT | Cart | Add a product to a customer's cart |
/cart/{customerId}/checkout | POST | Cart | Checkout a cart |
/cart/{customerId}/seal | POST | Cart | Reset a cart |
/customer | POST | Customer | Register a new customer |
/product | POST | Product | Register a new product |
/product | PATCH | Product | Update a product's price |
/product | PUT | Product | Replace a product |
/seller | POST | Seller | Register a new seller |
/seller/dashboard/{sellerId} | GET | Seller | Retrieve seller's dashboard for a given a seller |
/shipment/{tid} | PATCH | Shipment | Update packages to 'delivered' status |
/stock | POST | Stock | Register a new stock item |
For the requests that modify microservices' state (POST/PATCH/PUT), refer to classes present in Common to understand the expected payload.
There are two stable implementations of Online Marketplace available: Orleans and Statefun. In case you want to reproduce experiments, their repositories contain instructions on how to configure and deploy Online Marketplace.
The Dapr implementation is available, but outdated and possibly show bugs. Use with precaution. We intend to update the Online Marketplace on Dapr as soon as time allows.
The driver is written in C# and takes advantage over the thread management facilities provided by the .NET framework. It is strongly recommended to analyze the subprojects Orleans and Statefun to understand how to extend the driver to run experiments in other data platforms. Further instructions will be included soon.
The driver uses DuckDB to store and query generated data during the workload submission. Besides storing data in DuckDB filesystem, it is worthy noting that users can also generate data in memory to use in experiments. More info about can be found in Config. The benefit of persisting data in DuckDB is that such data can be safely reused in other experiments, thus decreasing experiment runs' overall time.
The library DuckDB.NET is used to bridge .NET with DuckDB. However, the library only supports Unix-based operating systems right now. As the driver depends on the data stored in DuckDB, unfortunately it is not possible to run the benchmark in Windows-based operating systems.
Furthermore, we use additional libraries to support the data generation process. Dapper is used to map rows to objects. Bogus is used to generate faithful synthetic data.
The driver requires a configuration file to be passed as input at startup. The configuration prescribes several important aspects of the experiment, including the transaction ratio, the target microservice API addresses, the data set parameters, the degree of concurrency, and more. An example configuration, with comments included when the parameter name is not auto-explanable, is shown below.
{
"connectionString": "Data Source=file.db", // defines the data source. if in-memory, set "Data Source=:memory"
"numCustomers": 100000,
"numProdPerSeller": 10,
"qtyPerProduct": 10000,
"executionTime": 60000, // prescribes each experiment's run total time
"epoch": 10000, // defines whether the output result will show metrics
"delayBetweenRequests": 0,
"delayBetweenRuns": 0,
// the transaction ratio
"transactionDistribution": {
"CUSTOMER_SESSION": 30,
"QUERY_DASHBOARD": 35,
"PRICE_UPDATE": 38,
"UPDATE_PRODUCT": 40,
"UPDATE_DELIVERY": 100
},
"concurrencyLevel": 48,
"ingestionConfig": {
"strategy": "WORKER_PER_CPU",
"concurrencyLevel": 32,
// these entries are mandatory
"mapTableToUrl": {
"sellers": "http://orleans:8081/seller",
"customers": "http://orleans:8081/customer",
"stock_items": "http://orleans:8081/stock",
"products": "http://orleans:8081/product"
}
},
// it defines the possible multiple runs this experiment contains
"runs": [
{
"numProducts": 100000,
"sellerDistribution": "UNIFORM",
"keyDistribution": "UNIFORM"
}
],
// defines the APIs that should be contact at the end of every run
"postRunTasks": [
],
// defines the APIs that should be contact at the end of the experiment
"postExperimentTasks": [
{
"name": "cleanup",
"url": "http://orleans:8081/cleanup"
}
],
// defines aspects related to customer session
"customerWorkerConfig": {
"maxNumberKeysToAddToCart": 10,
"minMaxQtyRange": {
"min": 1,
"max": 10
},
"checkoutProbability": 100,
"voucherProbability": 5,
"productUrl": "http://orleans:8081/product",
"cartUrl": "http://orleans:8081/cart",
// track which tids have been submitted
"trackTids": true
},
"sellerWorkerConfig": {
// adjust price percentage range
"adjustRange": {
"min": 1,
"max": 10
},
"sellerUrl": "http://orleans:8081/seller",
"productUrl": "http://orleans:8081/product",
// track product update history
"trackUpdates": false
},
"deliveryWorkerConfig": {
"shipmentUrl": "http://orleans:8081/shipment"
}
}
Other example configuration files are found in Configuration.
Once the configuration is set, and assuming the target data platform is up and running (i.e., ready to receive requests), we can initialize the benchmark driver. In the project root folder, run the following commands for the respective data platforms:
- Orleans
dotnet run --project Orleans <configuration file path>
- Statefun
dotnet run --project Statefun <configuration file path>
In both cases, the following menu will be shown to the user:
Select an option:
1 - Generate Data
2 - Ingest Data
3 - Run Experiment
4 - Ingest and Run (2 and 3)
5 - Parse New Configuration
q - Exit
Through the menu, the user can select specific benchmark tasks, including data generation (1), data ingestion into the data platform (2), and workload submission (3). In case the configuration file has been modified, one can also request the driver to read the new configuration (5) without the need to restart the driver.
At the end of an experiment cycle, the results collected along the execution are shown in the screen and stored automatically in a text file. The text file indicates the execution time, as well as some of the parameters used for faster identification of a specific run.
Data Generation. Ingestion Manager. Workload Manager.
Workers
- Customer worker. Simulate a customer session
- Seller worker. Simulate a seller session
- Delivery worker. Simulate an external system requesting package updates
Statistics Collection.
The Online Marketplace implementation targeting Microsoft Orleans supports tracking the cart history (make sure that the options StreamReplication
and TrackCartHistory
are set to true). By tracking the cart history, we can match the items in the carts with the history of product updates. That enables the identification of possible causal anomalies related to updates in multiple objects.
To enable such anomaly detection in the driver, make sure the options "trackTids" in customerWorkerConfig
and "trackUpdates" in sellerWorkerConfig
in the configuration file are set to true. By tracking the history of TIDs for each customer cart, we can request customer actors in Orleans about the content of their respective carts submitted for checkout. With the cart history, we match historic cart items with the history of product updates (tracked by driver's seller workers) to identify anomalies.
We understand these settings are sensible and prone to error. We are looking forward to improve such settings in the near future.
The project DriverBench can run simulated workload to test the driver scalability. That is, the driver's ability to submit more requests as more computational resources are added.
There are three impediments that refrain the driver from being scalable: a - Insufficient computational resources b - Contended workload c - The target platform itself
"a" can be mitigated with more CPUs and memory (to hold data in memory if necessary) "b" does not occur if uniform distribution is used. However, when using non-uniform distribution, the task is tricky because there could be some level of synchronization in the driver to make sure updates to a product are linearizable. Adjusting the zipfian constant can alleviate the problem in case non-uniform distribution is really necessary. "c" can be mitigated by (i) tuning the target data platform, (ii) increasing computational resources in the target platform, (iii) co-locating the driver with the data platform (remove network latency)
We intend to count the "add item to cart" operation as a measured query in the driver. In the current implementation, although the add item operation is not counted as part of the latency of a customer checkout, capturing the cost of an "add item" allows capturing the overall latency of the customer session as a whole and not only the checkout operation.