Skip to content

Latest commit

 

History

History
167 lines (119 loc) · 5.33 KB

21.md

File metadata and controls

167 lines (119 loc) · 5.33 KB

Day 21 Allergen Assessment - Part1/2 & Python LP!

Consider an input like this:

mxmxvkd kfcds sqjhc nhms (contains dairy, fish)
trh fvjkl sbzzf mxmxvkd (contains dairy)
sqjhc fvjkl (contains soy)
sqjhc mxmxvkd sbzzf (contains fish)

Which is a list of ingredients like mxmxvkd followed by a list of allergens that the list of ingredients can have.

The first food in the list has four ingredients (written in a language you don't understand): mxmxvkd, kfcds, sqjhc, and nhms. While the food might contain other allergens, a few allergens the food definitely contains are listed afterward: dairy and fish.

Part 1

The first step is to determine which ingredients can't possibly contain any of the allergens in any food in your list. In the above example, none of the ingredients kfcds, nhms, sbzzf, or trh can contain an allergen. Counting the number of times any of these ingredients appear in any ingredients list produces 5: they all appear once each except sbzzf, which appears twice.

Determine which ingredients cannot possibly contain any of the allergens in your list. How many times do any of those ingredients appear?

Part 2

Now that you've isolated the inert ingredients, you should have enough information to figure out which ingredient contains which allergen.

In the above example:

  • mxmxvkd contains dairy.
  • sqjhc contains fish.
  • fvjkl contains soy.

Arrange the ingredients alphabetically by their allergen and separate them by commas to produce your canonical dangerous ingredient list. (There should not be any spaces in your canonical dangerous ingredient list.) In the above example, this would be mxmxvkd,sqjhc,fvjkl.

Solution

Greedy algorithm via frequency counter

  1. For each allergen, store how often each ingredient appears. This will be a Map<Allergen, Map<Ingredient, number>>.
  2. An ingredient like mx may have a count of 2 for allergen dairy, whereas kf may only have a count of 1 for dairy. We assume that kf cannot be the ingredient that has the dairy allergen.
    • We end up with a Map<Allergen, Set<Ingredient>> because we've eliminated all ingredients that don't have the max frequency. Now there might be this in the Map: dairy -> [mx, sf] implying that the ingredient both had the max and the same number of times dairy listed in the list next to them.
  3. For part 1, we iterate through the Map<Allergen, Set<Ingredient>>, union all the Set<Ingredient> into a single Set which will consist of ingredients that have an allergen. Call this set A. Then we just do difference(allIngredients, A), which will return ingredients that definitely do not have any allergens.
  4. For part 2, we first look at the allergen that has the fewest potential ingredients. There must exist an allergen with only potential ingredient.. Say this ingredient is mx and allergen is dairy. We can now conclude that mx has dairy, and eliminate mx from ALL the allergens' potential ingredients.
  5. There must now AGAIN exist an allergen with only one potential ingredient after we removed mx. We keep doing this until we're all done!

You can even do part 2 by hand. There are 8 ingredients and 8 allergens after part 1 is done.

The code is well documented. Go check it out here.

Constraint Satisfaction Problem (CSP) - Integer Programming using Python

Installation

Install PuLP:

pip3 install --user pulp

Optionally, install GLPK:

brew install glpk
glpsol --version

If you decide not to use GLPK as your solver, change the following line:

model.solve(solver=GLPK(msg=False))

to

model.solve()

Optionally, install nodemon:

npm install nodemon -g

This will allow you to run a watcher on your Python changes:

nodemon --exec python3 21_solver.py

Prerequisites

I used this guide to get familiar with PuLP. See files playground1 and playground2 for basic demos.

Integer programming formulation

Parameters
  • $I$ - set of all ingredients
  • $A$ - set of all allergens
Indices
  • $i\in I$ - a specific ingredient
  • $a\in A$ - a specific allergen
Variable

$$ \begin{flalign} & X_{ai} = \left\lbrace\matrix{1&\text{if ingredient i has allergen a} \cr 0&\text{otherwise}}\right. & \end{flalign} $$

Objective

No objective for this. We'll use the default of LpMaximize.

Constraints
  1. A given allergen must exist in one and only one ingredient. $$\forall_a \space\space\space\space\space\space\space \sum_{i \in I}{X_{ia}} = 1$$
  2. An ingredient can have at most one allergen. $$\forall_i \space\space\space\space\space\space\space \sum_{a\in A}{X_{ia}} &lt;= 1$$
  3. The input of food lines are constraints. E.g., the line
    mx kf sq (contains dairy, fish)
    
    translates to

$$X_{mx,d} + X_{kf,d} + X_{sq,d} + X_{mx,f} + X_{kf,f} + X_{sq,f} = 2$$

It equals 2 because exactly two of those variables must be true. I.e., one of those ingredients must have dairy, and another must have fish.

Code

The code can be found here.