Labeling Machine is a web application written in Python that is intended to be used by researchers for labeling data with minimum effort. Here are some of its key features:
- Very lightweight: The tool is based on Flask
- Ready to go and Dockerized: Just follow Deployment section to see a demo of the tool in 15 minutes
- Easy to customize: Follow Customizing the Project steps to quickly adopt the tool for your project
Labeling Machine was originally implemented back in 2019 to be internally used for our empirical research study which involved labeling data from StackOverflow, Apache Mailing Lists, and GitHub issues and PRs. Later is went through several iterations and been used in more studies. As the tool became more mature and proven to be a practically useful tool, I decided to make it available to all researchers as an open-source tool.
Make sure Python 3.x is installed
# Check your Python version
$ python --version
Python 3.x
$ pip --version
xxxxx (python 3.x)
$ cd [Project]
$ python3 -m venv ./venv
$ source ./venv/bin/activate
# install dependencies
(venv)$ pip3 install -r requirements.txt
# verify flask in installed
(venv)$ flask --version
(venv)$ deactivate
$
- Open the project in PyCharm
- Configure the Python Interpreter:
Preference
>Project interpreter
>[Show All]
>+
> path to[Project]/venv/bin/python
- Create run configurations:
flask-run
: for running webapp server- Run > Edit Configurations > + > python:
- Name: flask-run
- Allow parallel run: unchecked
- Script Path: <absolute_path_to_project>//bin/flask
- Parameters: run
- Environment variables:
FLASK_APP=src; FLASK_ENV=development;
- Python Interpreter: Project default
- Working Directory: <absolute_path_to_project>/webapp/
flask-initdb
: for database initialization for the first time- Run > Edit Configurations > select newly created
flask-run
> Copy Configuration - Name: flask-initdb
- Parameters: initdb
- Note that the
initdb
name for parameter comes from@app.cli.command('initdb')
- Note that the
- Run > Edit Configurations > select newly created
Note: If you want to make the server visible to externals (i.e., accepting connection from all network adapters): (venv)$ flask run --host=0.0.0.0
Now we are ready to run the project:
- (only first time) Initialize the database by running
flask-initdb
configuration- This will run the method associated with
@app.cli.command('initdb')
- Delete the existing database (
/db/app.sqlite
) for a fresh start
- This will run the method associated with
- Run the WebApp by runing
flask-run
confguration- The webapp will be running on http://127.0.0.1:5000 by default
Q1. Why does PyCharm show red lines all over the source code?
- You should mark
[Project]/webapp/
folder as the root of source code: right-click on the folder > Mark Directory as > Sources Root
$ source [Project]/venv/bin/activate
(venv)$ cd [Project]/webapp/
# define path to our `app` variable
(venv)$ export FLASK_APP=src;
# enables auto code reloading on code changes, and provides helpful debugging info
(venv)$ export FLASK_ENV=development
(venv)$ flask initdb # initialize the database for the first time
(venv)$ flask run # run the WebApp
# The webapp will be running on http://127.0.0.1:5000 by default
If you want to make the server visible to externals (i.e., accepting connection from all network adapters): (venv)$ flask run --host=0.0.0.0
Please follow steps below for customizing the project to your needs. For all steps I already implemented a sample showcase. If something is not clear, make sure to open an issue.
- Importing your artifacts to be labeled:
- Update database scheme to store artifacts to be labeled on
Artifact
table. For that, simply updateArtifact
class inmodels.py
. - Update
initdb.py > import_my_data()
method to import your artifact data. - Run
initdb
configuration to perform the initialization (see Run Project sections above to learn more)
- Update database scheme to store artifacts to be labeled on
- Displaying artifacts to labelers:
- Update
routes_labeling.py > labeling_with_artifact(target_artifact_id)
method to send the content required to be displayed (i.e., the artifact content) - Update
artifact.html
to display the artifact in the way you like (NOTE: HTML files are written in Jinja web template language. Don't afraid. With Jinja syntax you technically have access to Python objects you sent in the previous step)
- Update
- Update the logic to collect labeled data:
- Design your own input form in
labeling_layout.html
to collect labeling data. (NOTE: For the showcase, I already implemented a simple pull-down menu that users can either (1) create a new label and select it, or (2) select a label among previously created labels (by any user). - Update database schema to store submitted data on
LabelingData
table. For that, simply updateLabelingData
class inmodels.py
. - Store submitted labels on the database by updating
routes_labeling.py > label()
method.
- Design your own input form in
Database Technology: By default, the tool uses SQLite as the database technology. However, since Labeling Machine relies on SQLAlchemy, an ORM toolkit, you can use any other DB technology (MYSQL, PostgreSQL, etc.) with change of a couple of lines here.
$ cd [Project]/docker
# Build image
$ docker-compose build
# Start a container (http://localhost:45000)
$ docker-compose up -d lm
# Stop the container
$ docker-compose down
$ cd [Project]/docker # the directory of Dockerfile
$ docker build -t lm-minimal:latest --file ./Dockerfile ..
# Start a container (http://localhost:45000)
$ docker run --name lm-minimal -p 45000:5000 -d lm-minimal:latest
# Stop the container
$ docker stop lm-minimal
$ docker rm lm-minimal
Docker Troubleshooting
Q1. Why do I still see the old database, although I updated db in the new image?
- If Docker Volume for older container exist, the volume doesn't get replaced with new images. Otherwise, we couldn't update our image without losing our existing data.
- Solution:
- Find the volume's name:
docker inspect <container_name>
and look forMounts > Name
field. - Delete the volume:
docker volume rm <volume_name>
- If it errors that volume is in use, try to stop container:
docker stop <container_name>
- Note: if you created the volume using
docker-compose
in the first place you have to remove the container:docker rm -v <container_name>
(-v
: remove volume as well)docker volume rm <volume_name>
- If it errors that volume is in use, try to stop container:
- Find the volume's name:
Q2. How can I copy database from the running container?
docker cp <container_name>:/labeling-machine/webapp/db/app.sqlite ~/local/path
Q3. How can I update the python code on the fly?
docker exec -it <container_name> /bin/bash
- Do your changes
exit
Note: Such changes are not persistent, so it's better you update source-code and build a new image.