Traditional recommender algorithms may periodically rebuild their models, but they cannot adjust to quick changes in trends caused by timely information. In contrast, online learning models can adopt to temporal effects, hence they may overcome the effect of concept drift in non-stationary online environments.
In our tutorial, we present open source systems capable of updating their models on the fly after each event: Apache Spark, Apache Flink and Alpenglow, a C++ based Python recommender framework. Participants of the tutorial will be able to experiment with all the three systems by using Zeppelin Notebooks.
Our final objective is to compare and then blend batch and online methods to build models providing high quality top-k recommendation in non-stationary environments.
The hands-on tutorial running parallel to the workshops is on Sunday, Aug 27, 2017. Participants may attend the tutorial in two identical sessions starting from either 9:00 or 14:00. Both sessions start with installation instructions, that is crucial to participate. Participants must register for the tutorial via this form.
- Installation
- Introduction to time-aware evaluation and learning
- Notebook 1: Alpenglow notebook
- Spark and Flink introduction
- Notebook 2: Comparing batch learning in Alpenglow, Flink, and Spark
- Notebook 3: Online stream based learning in Flink
Check the wiki page for the detailed installation guide.
Download the tutorial data from here. Extract the contained folder to your zeppelin root directory.
- 1 Alpenglow online ranking notebook
- 2 Alpenglow/Flink/Spark batch notebook
- 3 Flink parameter server based streaming SGD notebook
A brief summary of the tutorial's theoretical background is now available.