This repo contains everything needed to run a musicbrainz mirror server with search and replication in docker.
- Prerequisites
- Components version
- Installation
- Advanced configuration
- Test setup
- Development setup
- Helper scripts
- Update
- Cleanup
- Removal
- Issues
- CPU: 16 threads (or 2 without indexed search), x86-64 architecture
- RAM: 16 GB (or 4 without indexed search)
- Disk Space: 250 GB (or 100 without indexed search)
- Docker Compose 1.21.1 (or higher), see how to install Docker Compose
- Git
- GNU Bash 4 (or higher) utilities, for admin helper scripts only (On macOS, use Homebrew.)
- Linux or macOS (Windows is not documented yet, it is recommended to use Ubuntu via VirtualBox instead.)
If you use Docker Desktop on macOS you may need to increase the amount of memory available to containers from the default of 2GB:
- Preferences > Resources > Memory
If you use Ubuntu 19.10 or later, the above requirements can be set up by running:
sudo apt-get update && \
sudo apt-get install docker.io docker-compose git && \
sudo systemctl enable --now docker.service
If you use UFW to manage your firewall:
- ufw-docker or any other way to fix the Docker and UFW security flaw.
- Introduction: Getting started with Docker and Overview of Docker Compose
- Command-line:
docker
CLI reference anddocker-compose
CLI reference - Configuration: Compose file version 3 reference
- Current MB Branch: v-2024-04-09
- Current DB_SCHEMA_SEQUENCE: 28
- Postgres Version: 12
(can be changed by setting the environment variable
POSTGRES_VERSION
) - MB Solr search server: 3.4.2
(can be changed by setting the environment variable
MB_SOLR_VERSION
) - Search Index Rebuilder: 3.0.1
This section is about installing MusicBrainz mirror server with locally indexed search and automatically replicated data.
Download this repository and change current working directory with:
git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
If you want to mirror the Postgres database only (neither the website nor the web API), change the base configuration with the following command (as a first step, otherwise it will blank it out):
admin/configure with alt-db-only-mirror
Docker images for composed services should be built once using:
sudo docker-compose build
⚙️ Postgres shared buffers are set to 2GB by default. Before running this step, you should consider modifying your memory settings in order to give your database a sufficient amount of ram, otherwise your database could run very slowly.
Download latest full data dumps and create the database with:
sudo docker-compose run --rm musicbrainz createdb.sh -fetch
This is an optional step.
MusicBrainz Server makes use of materialized (or denormalized) tables in production to improve the performance of certain pages and features. These tables duplicate primary table data and can take up several additional gigabytes of space, so they're optional but recommended. If you don't populate these tables, the server will generally fall back to slower queries in their place.
If you wish to configure the materialized tables, you can run:
sudo docker-compose exec musicbrainz bash -c './admin/BuildMaterializedTables --database=MAINTENANCE all'
Make the local website available at http://localhost:5000 with:
sudo docker-compose up -d
At this point the local website will show data loaded from the dumps only. For indexed search and replication, keep going!
Depending on your available ressources in CPU/RAM vs. bandwidth:
-
Either build search indexes manually from the installed database:
sudo docker-compose exec indexer python -m sir reindex
⚙️ Java heap for Solr is set to 2GB by default. Before running this step, you should consider modifying your memory settings in order to give your search server a sufficient amount of ram, otherwise your search server could run very slowly.
(This option is known to take 4½ hours with 16 CPU threads and 16 GB RAM.)
To index cores individually, rather than all at once, add
--entity-type CORE
(any number of times) to the command above. For examplesudo docker-compose exec indexer python -m sir reindex --entity-type artist --entity-type release
-
Or download pre-built search indexes based on the latest data dump:
sudo docker-compose run --rm musicbrainz fetch-dump.sh search sudo docker-compose run --rm search load-search-indexes.sh
(This option downloads 30GB of Zstandard-compressed archives from FTP.)
etc/crontab
file from your server's root:
0 1 * * 7 YOUR_USER_NAME cd ~/musicbrainz-docker && /usr/bin/docker-compose exec -T indexer python -m sir reindex
At this point indexed search works on the local website/webservice. For replication, keep going!
First, copy your MetaBrainz access token (see instructions for generating a token) and paste when prompted to by the following command:
admin/set-replication-token
The token will be written to the file
local
/secrets/metabrainz_access_token
.
Then, grant access to the token for replication with:
admin/configure add replication-token
sudo docker-compose up -d
Run replication script once to catch up with latest database updates:
sudo bash -c 'docker-compose exec musicbrainz replication.sh &' && \
sudo docker-compose exec musicbrainz /usr/bin/tail -f mirror.log
Enable replication as a cron job of root
user in musicbrainz
service container with:
admin/configure add replication-cron
sudo docker-compose up -d
By default, it replicates data every day at 3 am UTC. To change that, see advanced configuration.
You can view the replication log file while it is running with:
sudo docker-compose exec musicbrainz tail --follow mirror.log
You can view the replication log file after it is done with:
sudo docker-compose exec musicbrainz tail mirror.log.1
-
Disable replication cron job if you enabled it:
admin/configure rm replication-cron sudo docker-compose up -d
-
Make indexer goes through AMQP Setup with:
sudo docker-compose exec indexer python -m sir amqp_setup admin/create-amqp-extension admin/setup-amqp-triggers install
-
Build search indexes if they either have not been built or are outdated.
-
Make indexer watch reindex messages with:
admin/configure add live-indexing-search sudo docker-compose up -d
-
Reenable replication cron job if you disabled it at 1.
admin/configure add replication-cron sudo docker-compose up -d
You should preferably not locally change any file being tracked by git. Check your working tree is clean with:
git status
Git is set to ignore the followings you are encouraged to write to:
.env
file,- any new file under
local
directory.
There are many ways to set environment variables in Docker
Compose, the most
convenient here is probably to edit the hidden file .env
.
You can then check values to be passed to containers using:
sudo docker-compose config
Finally, make Compose picks up configuration changes with:
sudo docker-compose up -d
By default, the web server listens at http://localhost:5000
This can be changed using the two Docker environment variables
MUSICBRAINZ_WEB_SERVER_HOST
and MUSICBRAINZ_WEB_SERVER_PORT
.
If MUSICBRAINZ_WEB_SERVER_PORT
set to 80
(http), then the
port number will not appear in the base URL of the web server.
If set to 443
(https), then the port number will not appear either,
but the a separate reverse proxy is required to handle https correctly.
By default, MusicBrainz Server uses 10 plackup
processes at once.
This number can be changed using the Docker environment variable
MUSICBRAINZ_SERVER_PROCESSES
.
By default, data dumps and pre-built search indexes are downloaded from
https://data.metabrainz.org/pub/musicbrainz
.
The download server can be changed using the Docker environment variable
MUSICBRAINZ_BASE_DOWNLOAD_URL
.
For backwards compatibility reasons an FTP server can be specified using the
MUSICBRAINZ_BASE_FTP_URL
Docker environment variable. Note that support for
this variable is deprecated and will be removed in a future release.
See the list of download servers for alternative download sources.
By default, there is no crontab file in musicbrainz
service container.
If you followed the steps to schedule replication,
then the crontab file used by musicbrainz
service is bound to
default/replication.cron
.
This can be changed by creating a custom crontab file under
local/
directory,
and finally
setting the Docker environment variable MUSICBRAINZ_CRONTAB_PATH
to
its path.
By default, the configuration file used by indexer
service is bound
to default/indexer.ini
.
This can be changed by creating a custom configuration file under
local/
directory,
and finally
setting the Docker environment variable SIR_CONFIG_PATH
to its path.
By default, the services indexer
and musicbrainz
are trying to connect to the host db
(for both read-only and write host) but the hosts can
be customized using the MUSICBRAINZ_POSTGRES_SERVER
and MUSICBRAINZ_POSTGRES_READONLY_SERVER
environment variables.
Notes:
- After switching to another Postgres server:
- If not transferring data, it is needed to create the database again.
- For live indexing, the RabbitMQ server has to still be reachable from the Postgres server.
- The helper scripts
check-search-indexes
andcreate-amqp-extension
won’t work anymore. - The service
db
will still be up even if unused.
By default, the services db
, indexer
and musicbrainz
are trying to connect to the host mq
but the host can be customized using the MUSICBRAINZ_RABBITMQ_SERVER
environment variable.
Notes:
- After switching to another RabbitMQ server:
- Live indexing requires to go through AMQP Setup again.
- If not transferring data, it might be needed to build search indexes again.
- The helper script
purge-message-queues
won’t work anymore. - The service
mq
will still be up even if unused.
By default, the service musicbrainz
is trying to connect to the host redis
but the host can be customized using the MUSICBRAINZ_REDIS_SERVER
environment variable.
Notes:
- After switching to another Redis server:
- If not transferring data, MusicBrainz user sessions will be reset.
- The service
redis
will still be running even if unused.
In Docker Compose, it is possible to override the base configuration using multiple Compose files.
Some overrides are available under compose
directory.
Feel free to write your own overrides under local
directory.
The helper script admin/configure
is able to:
- list available compose files, with a descriptive summary
- show the value of
COMPOSE_FILE
variable in Docker environment - set/update
COMPOSE_FILE
in.env
file with a list of compose files - set/update
COMPOSE_FILE
in.env
file with added or removed compose files
Try admin/configure help
for more information.
To publish ports of services db
, mq
, redis
and search
(additionally to musicbrainz
) on the host, simply run:
admin/configure add publishing-all-ports
sudo docker-compose up -d
If you are running a database only mirror, run this instead:
admin/configure add publishing-db-port
sudo docker-compose up -d
By default, each of db
and search
services have about 2GB of RAM.
You may want to set more or less memory for any of these services,
depending on your available resources or on your priorities.
For example, to set 4GB to each of db
and search
services,
create a file local/compose/memory-settings.yml
as follows:
version: '3.1'
# Description: Customize memory settings
services:
db:
command: postgres -c "shared_buffers=4GB" -c "shared_preload_libraries=pg_amqp.so"
search:
environment:
- SOLR_HEAP=4g
See postgres
for more configuration parameters and options to pass to db
service,
and solr.in.sh
for more environment variables to pass to search
service,
Then enable it by running:
admin/configure add local/compose/memory-settings.yml
sudo docker-compose up -d
If you just need a small server with sample data to test your own SQL queries and/or MusicBrainz Web Service calls, you can run the below commands instead of following the above installation:
git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
admin/configure add musicbrainz-standalone
sudo docker-compose build
sudo docker-compose run --rm musicbrainz createdb.sh -sample -fetch
sudo docker-compose up -d
The two differences are:
- Sample data dump is downloaded instead of full data dumps,
- MusicBrainz Server runs in standalone mode instead of mirror mode.
Build search indexes and Enable live indexing are the same.
Replication is not applicable to test setup.
Required disk space is much lesser than normal setup: 15GB to be safe.
The below sections are optional depending on which service(s) you are coding.
For local development of MusicBrainz Server, you can run the below commands instead of following the above installation:
git clone https://github.com/metabrainz/musicbrainz-server.git
MUSICBRAINZ_SERVER_LOCAL_ROOT=$PWD/musicbrainz-server
git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
echo MUSICBRAINZ_DOCKER_HOST_IPADDRCOL=127.0.0.1: >> .env
echo MUSICBRAINZ_SERVER_LOCAL_ROOT="$MUSICBRAINZ_SERVER_LOCAL_ROOT" >> .env
admin/configure add musicbrainz-dev
sudo docker-compose build
sudo docker-compose run --rm musicbrainz createdb.sh -sample -fetch
sudo docker-compose up -d
The main differences are:
- Sample data dump is downloaded instead of full data dumps,
- MusicBrainz Server runs in standalone mode instead of mirror mode,
- Development mode is enabled (but Catalyst debug),
- JavaScript and resources are automaticaly recompiled on file changes,
- MusicBrainz Server is automatically restarted on Perl file changes,
- MusicBrainz Server code is in
musicbrainz-server/
directory. - Ports are published to the host only (through
MUSICBRAINZ_DOCKER_HOST_IPADDRCOL
)
After changing code in musicbrainz-server/
, it can be run as follows:
sudo docker-compose restart musicbrainz
Build search indexes and Enable live indexing are the same.
Replication is not applicable to development setup.
Simply restart the container when checking out a new branch.
This is very similar to the above but for Search Index Rebuilder (SIR):
- Set the variable
SIR_LOCAL_ROOT
in the.env
file - Run
admin/configure add sir-dev
- Run
sudo docker-compose up -d
Notes:
- It will override any
config.ini
file in your local working copy of SIR. - Requirements are being cached and will be updated on container’s startup.
- See how to configure SIR in
musicbrainz-docker
.
The situation is quite different for this service as it doesn’t depends on any other. Its development rather rely on schema. See mb-solr and mmd-schema.
However, other services depend on it, so it is useful to run a local
version of mb-solr
in search
service for integration tests:
- Run
build.sh
from yourmb-solr
local working copy, which will build an image ofmetabrainz/mb-solr
with a local tag reflecting the working tree status of your local clone ofmb-solr
. - Set
MB_SOLR_VERSION
in.env
to this local tag. - Run
sudo docker-compose up -d
There are two directories with helper scripts:
-
admin/
contains helper scripts to be run from the host. For more information, use the--help
option:admin/check-search-indexes --help admin/delete-search-indexes --help
See also:
- Docker Compose overrides for more
information about
admin/configure
. - Enable live indexing for more information
about
admin/create-amqp-extension
andadmin/setup-amqp-triggers
. - Enable replication for more information
about
admin/set-replication-token
.
- Docker Compose overrides for more
information about
-
build/musicbrainz/scripts/
contains helper scripts to be run from the container attached to the servicemusicbrainz
. Most of these scripts are not for direct use, but createdb.sh and below-documented recreatedb.sh.
If you need to recreate the database, you will need to enter the postgres password set in postgres.env:
sudo docker-compose run --rm musicbrainz recreatedb.sh
or to fetch new data dumps before recreating the database:
sudo docker-compose run --rm musicbrainz recreatedb.sh -fetch
If you need to recreate the database with indexed search,
admin/configure rm replication-cron # if replication is enabled
sudo docker-compose stop
sudo docker-compose run --rm musicbrainz fetch-dump.sh both
admin/purge-message-queues
sudo docker-compose run --rm search load-search-indexes.sh --force
sudo docker-compose run --rm musicbrainz recreatedb.sh
sudo docker-compose up -d
admin/setup-amqp-triggers install
admin/configure add replication-cron
sudo docker-compose up -d
you will need to enter the postgres password set in postgres.env:
sudo docker-compose run --rm musicbrainz recreatedb.sh
or to fetch new data dumps before recreating the database:
sudo docker-compose run --rm musicbrainz recreatedb.sh -fetch
Check your working tree is clean with:
git status
Check your currently checked out version:
git describe --dirty
Check releases for update instructions.
Each time you are rebuilding a new image, for either updating to a new release or applying some changes in configuration, the previous image is not removed. On the one hand, it is convenient as it allows you to quickly restore it in case the new image has critical issues. On the other hand, it is filling your disk with some GBs over time. Thus it is recommended to do a regular cleanup as follows.
sudo docker system prune --all
Removing the directory isn’t enough, the Docker objects (images, containers, volumes) have to be removed too for a complete removal.
Before removing the directory where you cloned this repository, run the following command from that directory.
sudo docker-compose down --remove-orphans --rmi all --volumes
It will output what has been removed so that you can check it. Only after it is over, you can remove the directory.
If anything doesn't work, check the troubleshooting page.
If you still don’t have a solution, please create an issue with versions info:
echo MusicBrainz Docker: `git describe --always --broken --dirty --tags` && \
echo Docker Compose: `docker-compose version --short` && \
sudo docker version -f 'Docker Client/Server: {{.Client.Version}}/{{.Server.Version}}'