Skip to content

Commit

Permalink
Migrate Mquery to typed config library (#324)
Browse files Browse the repository at this point in the history
This changes the current mquery configuration mechanism from editing config.py file, to a "real" config file like this (mquery.ini):

```ini
[redis]
host=redis-server.example.com

[mquery]
backend=tcp://ursadb-server.example.com:9281
plugins=plugins.archive:GzipPlugin
```
  • Loading branch information
msm-code authored Jan 24, 2023
1 parent 23c3db6 commit 1e7b895
Show file tree
Hide file tree
Showing 24 changed files with 254 additions and 68 deletions.
9 changes: 3 additions & 6 deletions .github/workflows/test_code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@ jobs:
run: pip3 install mypy==0.790
- name: install requirements
run: pip3 install -r requirements.txt
- name: copy config
run: cp src/config.example.py src/config.py
- name: run mypy
run: mypy src
test_python_style:
Expand All @@ -41,8 +39,6 @@ jobs:
uses: actions/setup-python@v1
with:
python-version: '3.10'
- name: copy config
run: cp src/config.example.py src/config.py
- name: install flake8==3.7.9
run: pip3 install flake8==3.7.9
- name: run flake8
Expand All @@ -56,8 +52,6 @@ jobs:
uses: actions/setup-python@v1
with:
python-version: '3.10'
- name: copy config
run: cp src/config.example.py src/config.py
- name: install black
run: pip3 install black==22.3.0
- name: run black
Expand Down Expand Up @@ -115,6 +109,9 @@ jobs:
run: docker-compose up --scale daemon=1 --build -d
- name: run e2e tests
run: docker run --net mquery_default -v $(readlink -f ./samples):/mnt/samples mquery_tests
- name: get run logs
if: always()
run: docker-compose logs
- name: stop docker compose
if: always()
run: docker-compose down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
*.pyc
venv/
config.py
.vscode
.idea
.mypy_cache
Expand Down
1 change: 0 additions & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ Docker compose dedicated for developers.
```
git clone --recurse-submodules https://github.com/CERT-Polska/mquery.git
cd mquery
cp src/config.docker.py src/config.py
# now set SAMPLES_DIR to a directory with your files, and INDEX_DIR to
# empty directory for database files to live in. By default database will
# expect files in ./samples directory, and keep index in ./index.
Expand Down
1 change: 0 additions & 1 deletion deploy/docker/daemon.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,5 @@ RUN ls /tmp/requirements*.txt | xargs -i,, pip --no-cache-dir install -r ,,

COPY "src/" "/app"
RUN chmod +x "/app/daemon.py"
COPY "src/config.docker.py" "/app/config.py"

ENTRYPOINT ["/app/daemon.py"]
1 change: 0 additions & 1 deletion deploy/docker/web.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,4 @@ RUN ls /tmp/requirements*.txt | xargs -i,, pip --no-cache-dir install -r ,,

COPY "src/." "."
COPY --from=build "/app/build" "./mqueryfront/build"
COPY "src/config.docker.py" "config.py"
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5000"]
6 changes: 4 additions & 2 deletions docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ services:
- "redis"
- "ursadb"
environment:
- "MQUERY_PLUGINS=${MQUERY_PLUGINS}"
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
dev-daemon:
build:
context: .
Expand All @@ -42,7 +43,8 @@ services:
- "redis"
- "ursadb"
environment:
- "MQUERY_PLUGINS=${MQUERY_PLUGINS}"
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
ursadb:
image: mqueryci/ursadb:v1.5.1
ports:
Expand Down
6 changes: 6 additions & 0 deletions docker-compose.e2etests-local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ services:
depends_on:
- "redis"
- "ursadb"
environment:
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
dev-daemon:
build:
context: .
Expand All @@ -38,6 +41,9 @@ services:
depends_on:
- "redis"
- "ursadb"
environment:
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
ursadb:
image: mqueryci/ursadb:v1.5.1
ports:
Expand Down
6 changes: 4 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ services:
- "redis"
- "ursadb"
environment:
- "MQUERY_PLUGINS=${MQUERY_PLUGINS}"
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
daemon:
restart: always
build:
Expand All @@ -30,7 +31,8 @@ services:
- "redis"
- "ursadb"
environment:
- "MQUERY_PLUGINS=${MQUERY_PLUGINS}"
- "REDIS_HOST=redis"
- "MQUERY_BACKEND=tcp://ursadb:9281"
ursadb:
restart: always
image: mqueryci/ursadb:v1.5.1
Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## User guide

- [Installation](../INSTALL.md): Installation instruction.
- [Configuration](./configuration.md): Additional configuration options.
- [Components](./components.md): More detailed description of mquery components.
- [Indexing](./indexing.md): Indexing files is one of the most important things in
mquery. In simple cases it can be solved without leaving the web UI, but
Expand Down
135 changes: 135 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Configuration

There are three different things you can configure within Mquery: core, plugins
and ursadb. Unfortunately, all are configured differently.

## Mquery core configuration

Mquery is configured with [typed-config](https://github.com/bwindsor/typed-config). There are two ways to pass every configuration field - with
a config file, or a environment variable. For example:

```ini
[redis]
host=redis-server.example.com

[mquery]
backend=tcp://ursadb-server.example.com:9281
plugins=plugins.archive:GzipPlugin
```

This is a simple INI configuration file that mquery understands. It should
be saved in a file called `mquery.ini`. The file should be in one of the
following locations (checked in that order):

* Mquery's working directory (usually `src` folder in the cloned repository)
* In the current user's xdg config directory: `~/.config/mquery/mquery.ini`
* In the system config directory: `/etc/mquery/mquery.ini`

Alternatively, you can use environment variables to configure mquery. All
field names are mapped intuitively to environment variables by joining
the ini section name with a key name - for example, to change redis host
value use `REDIS_HOST`. Environment variables take precedence over values
from the config file!

Currently, supported configuration keys are:

- `redis.host`: Hostname of a main redis server.
- `redis.port`: Port of a main redis server.
- `mquery.backend`: URL to a ursadb instance (for example,
`tcp://ursadb-server:9281`)
- `mquery.plugins`: List of supported plugins, separated by commas (for
example `plugins.archive:GzipPlugin, plugins.custom:CustomPlugin`)

## Mquery plugin configuration

In contrast to the core configuration, plugins can be configured dynamically.
Every worker registers its list of active plugins in the database, and it's
possible to configure them using the web UI:

![](./plugin-config.png)

This configuration mechanism is used by the plugins shipped with Mquery.
Despite this, it's optional, and plugin authors don't have to use it.
Since plugins are arbitrary code, plugins can read their configuration from
anywhere they want, including the environment, other config files, etc.

It's also easy to use the same config file for Mquery and plugins - see
[example_typed_config_plugin.py](../src/plugins/example_typed_config_plugin.py)
file for an example.

## UrsaDB configuration

UrsaDB is not technically part of Mquery, but both systems work closely
together and depend on each other for optimal performance.

Mquery currently does not allow you to configure UrsaDB nicely.
You have to do it "manually", by connecting with `ursacli` program to the
TCP port exposed by UrsaDB. This program is built together with UrsaDB, and
available in all official docker images. You can execute it in docker-compose
like this:

```
sudo docker-compose -f docker-compose.dev.yml exec ursadb ursacli
```

Or you can download the latest ursadb release and run a client from there.

To set a configuration field, issue a command like this:

```
$ ursacli
ursadb> config set "database_workers" 10;
```

The configuration keys are already documented in the UrsaDB's docs here:
https://cert-polska.github.io/ursadb/configuration.html. We won't copy
all relevant information here, but the most important config keys are:

* `database_workers` - How many tasks can be processed at once (you can
increase this for strong servers, restart the database to apply).
* `merge_max_files` - Biggest supported dataset size. UrsaDB keeps indexed
files in so-called "datasets". The fewer datasets the faster the database is,
but you might not want overly huge datasets (at some point merging datasets
becomes very slow, and you get diminishing returns for merging them). Decide
on the value before indexing your files. Good values include the default
(infinite), 10 million, and 1 million.
* `merge_max_datasets` - a very memory and CPU-intensive operation.
If your database OOMs during indexing, consider lowering this number and
the number of `database_workers` that are merging in parallel.

## .env file

Finally, in the main directory of the repository there is a file named `.env`.
Mquery does not use it in any way, but it's read by Docker.

```bash
$ cat .env
# This file is only relevant for docker-compose deployments.

# Directory where your samples are stored. By default you have to copy them
# to ./samples subdirectory in this repository.
SAMPLES_DIR=./samples
# Directory where the index files should be saved. By default ./index
# subdirectory in this repository.
INDEX_DIR=./index
```

If you use docker-compose to start mquery, you can use this file to specify
a location on the host for your samples_dir and index_dir. These variables are
then used when creating containers. See for example ursadb container spec:

```yaml
ursadb:
restart: always
image: mqueryci/ursadb:v1.5.0
ports:
- "127.0.0.1:9281:9281"
volumes:
- "${SAMPLES_DIR}:/mnt/samples"
- "${INDEX_DIR}:/var/lib/ursadb"
user: "0:0"
```
As you can see, variables from `.env` are used to specify mount point for
the data volumes. You can also ignore this file, and edit docker-compose
directly to your liking.
12 changes: 6 additions & 6 deletions docs/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,19 @@ by plugins.

![](plugin-config.png)

To add a new plugin to the system, you need to change PLUGINS key in
`config.py` for bare metal setup. For example:
To add a new plugin to the system, you need to change mquery.plugins key in
[the config](./configuration.md). For example:

```python
PLUGINS = ["plugins.mwdb_uploads:MalwarecageUploadsMetadata"]
[mquery]
plugins=plugins.mwdb_uploads:MalwarecageUploadsMetadata
```

To load a plugin `MalwarecageUploadsMetadata` from `plugins.mwdb_uploads`
module.

To load plugins with docker-compose deployment, you can change
`MQUERY_PLUGINS` environment variable in the container to load existing
plugin, but to load your own plugin you need to create your own image.
Remember that you can also use environment variable MQUERY_PLUGINS to do the
same thing - this may be useful for docker-based deployments.

## Filter plugins

Expand Down
1 change: 1 addition & 0 deletions requirements.plain.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ yara-python
yaramod
cachetools
pyjwt[crypto]
typed-config
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ yara-python==4.1.3
yaramod==3.12.1
PyJWT[crypto]==2.3.0
rq==1.11.1
typed-config==1.3.2
1 change: 0 additions & 1 deletion src/.dockerignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
config.py
.pytest_cache
.mypy_cache
__pycache__
6 changes: 3 additions & 3 deletions src/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from threading import Lock

import uvicorn # type: ignore
import config
from config import app_config
from fastapi import (
FastAPI,
Body,
Expand Down Expand Up @@ -48,9 +48,9 @@
)


db = Database(config.REDIS_HOST, config.REDIS_PORT)
db = Database(app_config.redis.host, app_config.redis.port)
app = FastAPI()
plugins = PluginManager(config.PLUGINS, db)
plugins = PluginManager(app_config.mquery.plugins, db)
plugin_lock = Lock()


Expand Down
10 changes: 0 additions & 10 deletions src/config.docker.py

This file was deleted.

8 changes: 0 additions & 8 deletions src/config.example.py

This file was deleted.

34 changes: 34 additions & 0 deletions src/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from typedconfig import Config, key, section, group_key # type: ignore
from typedconfig.source import EnvironmentConfigSource, IniFileConfigSource # type: ignore
import os


@section("redis")
class RedisConfig(Config):
host = key(cast=str, required=False, default="localhost")
port = key(cast=int, required=False, default=6379)


@section("mquery")
class MqueryConfig(Config):
backend = key(cast=str, required=False, default="tcp://127.0.0.1:9281")
plugins = key(cast=str, required=False, default="")


class AppConfig(Config):
redis = group_key(RedisConfig)
mquery = group_key(MqueryConfig)


def _config_sources():
return [
EnvironmentConfigSource(),
IniFileConfigSource("mquery.ini", must_exist=False),
IniFileConfigSource(
os.path.expanduser("~/.config/mquery/mquery.ini"), must_exist=False
),
IniFileConfigSource("/etc/mquery/mquery.ini", must_exist=False),
]


app_config = AppConfig(sources=_config_sources())
Loading

0 comments on commit 1e7b895

Please sign in to comment.