In this code pattern historical shopping data is used to build a recommendation engine with Spark and Watson Machine Learning. The model is then used in an interactive PixieApp in which a shopping basket is simulated and used to create a list of recommendations.
When you have completed this code patterns, you will understand how to:
- Use Jupyter Notebooks in IBM Watson Studio
- Build a recommendation model with SparkML and Watson Machine Learning to provide product recommendations for customers based on their purchase history
- Build an interactive dashboard using PixieApps
The intended audience is data scientists and developers interested in building, deploying and testing machine learning models from a Jupyter notebook with Watson Machine Learning.
Sample output
Here's an example of what the final app looks like
- Load the provided notebook into Watson Studio
- Load and transform the customer data in the notebook
- Build a k-means clustering model with SparkML
- Deploy the model to Watson Machine Learning
- Test and compare the models build in the notebook and through the Watson Machine Learning API
- Use the API to build an interactive PixieApp
- IBM Watson Studio: a suite of tools and a collaborative environment for data scientists, developers and domain experts
- IBM Apache Spark: an open source cluster computing framework optimized for extremely fast and large scale data processing
- IBM Watson Machine Learning: a set of REST APIs to develop applications that make smarter decisions, solve tough problems, and improve user outcomes
- Jupyter Notebooks: an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text
- PixieDust: Python helper library for Jupyter notebooks
- PixieApps: Python library to write and run UI elements for analytics directly in a Jupyter notebook
- Setup project and data in Watson Studio
- Create and deploy a recommendation engine with Watson Studio
- Run the recommendation PixieApp
To complete this code pattern we'll need to do a few setup steps before creating our model. In Watson Studio we need to: create a project, add our patient data (which our model will be based on), upload our notebook, and provision a Watson Machine Learning service.
-
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
-
Create a new project by clicking
+ New project
and choosingData Science
: -
Enter a name for the project name and click
Create
.
NOTE: By creating a project in Watson Studio a free tier
Object Storage
service will be created in your IBM Cloud account. Select theFree
storage type to avoid fees.
-
Click on the navigation menu on the left (
☰
) to show additional options. Click on theWatson Services
option. -
From the overview page, click
+ Add service
on the top right and choose theMachine Learning
service. Select theLite
plan to avoid fees. -
Once provisioned, you should see the service listed in the
Watson Services
overview page. Select the service by opening the link in a new tab. We're now in the IBM Cloud tool, where we will create service credentials for our now Watson Machine Learning service. Follow the numbered steps in the image below. We'll be using these credentials in Step 2, so keep them handy!. -
TIP: You can now go back the project via the navigation menu on the left (
☰
).
The notebook we'll be using can be viewed in notebooks/wml-product-recommendation-engine.ipynb
, and a completed version can be found in examples/wml-product-recommendation-engine-complete.ipynb
.
-
From the new project
Overview
panel, click+ Add to project
on the top right and choose theNotebook
asset type. Fill in the following information:- Select the
From URL
tab. [1] - Enter a
Name
for the notebook and optionally a description. [2] - Under
Notebook URL
provide the following url: https://github.com/IBM/product-recommendation-with-watson-ml/blob/master/notebooks/wml-product-recommendation-engine.ipynb [3] - For
Runtime
select theSpark Python 3.6
option. [4]
- Select the
-
TIP: Once successfully imported, the notebook should appear in the
Notebooks
section of theAssets
tab.
Now that we're in our Notebook editor, we can start to create our predictive model by stepping through the notebook.
-
Click the ▶
Run
button to start stepping through the notebook. -
In cell 2. Load and explore data we call PixieDust's
display()
method to view the data interactively. -
Keep stepping through the code, pausing on each step to read the code and see the output for the opertion we're performing. At the end of Step 3 we'll have used the K-means Clustering algorithm on Spark ML to create a model. This model associates every customer to a cluster based on their shopping history.
The gist of the next two steps is to use the Watson Machine Learning Python client to persist and deploy the model we just created.
-
At the beginning of Step 5. Persist model, before we deploy our model, we need up update the cell with credentials from our Watson Machine Learning service. (Remember that from Step 1.2 Provision a Watson Machine Learning service?)
-
Update the
wml_credentials
variable below. Copy and paste the entire credential dictionary, which can be found on the Service Credentials tab of the Watson Machine Learning service instance created on the IBM Cloud. -
Keep stepping through the code, pausing on each step to read the code and see the output for the opertion we're performing. At the end of Step 4 we'll have used the Watson Machine Learning service to persist and save our recommendation model! 🎉
-
Now let's run Step 5. Deploy model to the IBM cloud of the notebook. Deploy our model so we can have an endpoint to score data against.
-
In Step 6. Create product recommendations we create functions that query the database to find the most popular items for a cluster and calculate the recommendations based on a given cluster. This produces a list of recommended items based on the products and quantities in a user's cart, which uses Watson Machine Learning to calculate the cluster based on the shopping cart contents.
-
Lastly, in Step 6.1 Test product recommendations model, these functions can now be used in a PixieApp to create an interactive dashboard.
Here we add some products to our cart, and get some recommendations:
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.