This is a repo that explores various uplift modelling packages on the Criteo dataset.
- Note that the exact Criteo dataset is too large (3.2 GB) so it was not pushed into the repo.
- A subsample (1%) of the data with Stratified train-test split was done, and can be replicated using the Notebook code found in the Criteo Data folder
The concept of uplift is based on the concept of heterogenous treatment effects within experimental units in a population. With uplift, we can create policies for targeted treatment for specific populations that respond best to the treatment, or avoid wasting resources applying treatment to population subgroups that do not respond well to it. In the context of digital commerce, we can decide prioritise customer segments to perform targeted campaign advertising/coupons, and perform benefit-cost optimisation for campaigns.
All of this is centered around causal inference, and without going into too much details, there are some assumptions we need to make about the causal model:
- Unconfoundedness
- Conditional exchangeability
- No collider bias
Some concepts that will be explored in this repo are:
- Gain/Qini curves
- Propensity Scoring
- Transformed Outcomes
- Meta-Learners
- CausalForests
To test out this repo, it would be best to create a virtual env with Python 3.6 for this repo. Once you have created the necessary environment, you can proceed to install the required software packages from the "requirements.txt" file found at the root of the repo.
pip install requirements.txt