Day 21 Allergen Assessment - Part1/2 & Python LP!
Consider an input like this:
mxmxvkd kfcds sqjhc nhms (contains dairy, fish)
trh fvjkl sbzzf mxmxvkd (contains dairy)
sqjhc fvjkl (contains soy)
sqjhc mxmxvkd sbzzf (contains fish)
Which is a list of ingredients like mxmxvkd
followed by a list of allergens
that the list of ingredients can have.
The first food in the list has four ingredients (written in a language you don't
understand): mxmxvkd
, kfcds
, sqjhc
, and nhms
. While the food might
contain other allergens, a few allergens the food definitely contains are listed
afterward: dairy
and fish
.
The first step is to determine which ingredients can't possibly contain any of
the allergens in any food in your list. In the above example, none of the
ingredients kfcds
, nhms
, sbzzf
, or trh
can contain an allergen. Counting the
number of times any of these ingredients appear in any ingredients list produces
5: they all appear once each except sbzzf
, which appears twice.
Determine which ingredients cannot possibly contain any of the allergens in your list. How many times do any of those ingredients appear?
Now that you've isolated the inert ingredients, you should have enough information to figure out which ingredient contains which allergen.
In the above example:
- mxmxvkd contains dairy.
- sqjhc contains fish.
- fvjkl contains soy.
Arrange the ingredients alphabetically by their allergen and separate them by commas to produce your canonical dangerous ingredient list. (There should not be any spaces in your canonical dangerous ingredient list.) In the above example, this would be mxmxvkd,sqjhc,fvjkl.
- For each allergen, store how often each ingredient appears. This will be a
Map<Allergen, Map<Ingredient, number>>
. - An ingredient like
mx
may have a count of2
for allergendairy
, whereaskf
may only have a count of1
fordairy
. We assume thatkf
cannot be the ingredient that has thedairy
allergen.- We end up with a
Map<Allergen, Set<Ingredient>>
because we've eliminated all ingredients that don't have the max frequency. Now there might be this in the Map:dairy -> [mx, sf]
implying that the ingredient both had the max and the same number of timesdairy
listed in the list next to them.
- We end up with a
- For part 1, we iterate through the
Map<Allergen, Set<Ingredient>>
, union all theSet<Ingredient>
into a singleSet
which will consist of ingredients that have an allergen. Call this setA
. Then we just dodifference(allIngredients, A)
, which will return ingredients that definitely do not have any allergens. - For part 2, we first look at the allergen that has the fewest potential
ingredients. There must exist an allergen with only potential
ingredient.. Say this ingredient is
mx
and allergen isdairy
. We can now conclude thatmx
hasdairy
, and eliminatemx
from ALL the allergens' potential ingredients. - There must now AGAIN exist an allergen with only one potential
ingredient after we removed
mx
. We keep doing this until we're all done!
You can even do part 2 by hand. There are 8 ingredients and 8 allergens after part 1 is done.
The code is well documented. Go check it out here.
Install PuLP:
pip3 install --user pulp
Optionally, install GLPK:
brew install glpk
glpsol --version
If you decide not to use GLPK as your solver, change the following line:
model.solve(solver=GLPK(msg=False))
to
model.solve()
Optionally, install nodemon:
npm install nodemon -g
This will allow you to run a watcher on your Python changes:
nodemon --exec python3 21_solver.py
I used this guide to get
familiar with PuLP
. See files playground1 and
playground2 for basic demos.
-
$I$ - set of all ingredients -
$A$ - set of all allergens
-
$i\in I$ - a specific ingredient -
$a\in A$ - a specific allergen
No objective for this. We'll use the default of LpMaximize
.
- A given allergen must exist in one and only one ingredient.
$$\forall_a \space\space\space\space\space\space\space \sum_{i \in I}{X_{ia}} = 1$$ - An ingredient can have at most one allergen.
$$\forall_i \space\space\space\space\space\space\space \sum_{a\in A}{X_{ia}} <= 1$$ - The input of food lines are constraints. E.g., the line
translates to
mx kf sq (contains dairy, fish)
It equals 2
because exactly two of those variables must be true. I.e., one of those ingredients must have dairy, and another must have fish.
The code can be found here.