Who is the manufacturer of the milk, cheese or sausage, for example? Was the cheaper product produced by the same manufacturer as a well-known brand product?
In order to identify the producer of products of animal origin, the EU has introduced the so-called health mark. It is an oval symbol that includes, among other things, the country of origin and the approval number of the manufacturing company.
I created an App that let you identify manufacturer by selecting the country and the approval number.
I do not have an agreement with the authorities managing and maintaining the data to use the data. So I cannot provide the data or the app with the data. You unfortunatly need to do this part of the magic yourself.
The python scripts provided here download the content and add them to a sqlite data base. Due to usage rights I did not include the data files nor the data base.
Currently, the countries - Germany (DE), Austria (AT), Switzerland (CH), Italy (IT) and France (FR) are available here.
I am using the python library camelot-py
to process the pdf files from Austria. I had some issues with version 0.11.0
and python 3.12
. It worked with python 3.12.1
or 3.11.x
.
In order to do the processing of the pdf file using camelot-py you need to install ghostscript first as described here.
Install the required packages e.g. like:
pip install -r requirements.txt
See requirements.txt
- pandas master/)
- requests
- pathlib
- lxml
- html5lib
- beautifulsoup4
- opencv-python
- ghostscript
- [camelot-py](https://camelot-py.readthedocs.io/en/
The download will take some time. However, processing the PDF files from Austria took quite a while as well, really. So if you do not need them, commment out the part in run.py
.
The German data cannot be downloaded by a script. So you need to download the data first (see below).
The italian data are downloads and processes in one script. So you will have to wait a view seconds (depending on your hardware) for the last step.
Most of the magic is done automatically within the script run.py
. If you do not intend to use all of the countries data, you can comment out the parts and the processing is much faster.
-
In this project create a folder named
de
-
Download the XML file provided at the bottom of this page and save it as
export.xml
in the folderde
. It shoold look like:./de/export.xml
Run the script run.py
e.g.:
python run.py
As already stated, it will take a while especially if Austria is included.
Each country has its own table. I am using coutry codes as table names. So the table for Germany is called de
, the table for Austria is called at
and so on.
Each table has the same columns:
- name (Text): The name of the producer
- address (Text): The address - street-name number, postcode city - in this case - however, any format would be ok. I am just handling this as a text field.
- approvalNo (Text): The actuall approval number of the EC identification and health marks
- approvalNoOld (Text): The old number (I think this is used mainly in Germany)
- comment (Text): Some additional information if it exists.
As stated, there is an App (currently for Windows and Android) you can use to search within the data.
The EU provides a list of web pages to the country individual health mark information. All are provided in different formats. So there is no one solution for it all.
Further instructions on the processing of the different formats I am using are given as comment in the individual python scripts. so feel free to explore my solutions and find better once.
The first letters of the scripts indicate the country code: de
(Germany), at
(Austria), ... . This is also used as table name in the sqlite database.
Usually there is a file called at_download.py
which is a download script for the content and a file called at_db_script.py
which parses the downloaded file and populates the content in the sqlite db.
There was an issue with Python 3.12 and the at_db_script.py
using the library camelot
. I could not get it running. With 3.11 and 3.12.1 it worked.
Having created the data base you can use it in the App (currently available for Windows and Android).