Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run one data source when the database is not created by Mordred #556

Open
zhquan opened this issue Jun 24, 2022 · 7 comments
Open

Only run one data source when the database is not created by Mordred #556

zhquan opened this issue Jun 24, 2022 · 7 comments
Labels

Comments

@zhquan
Copy link
Member

zhquan commented Jun 24, 2022

When the database is created by default by MariaDB or manually create database test_sh; Mordred will run only one data source randomly.

Example:

  • MariaDB with MARIADB_DATABASE=test_sh
version: '3'
services:
  mariadb:
    image: mariadb:10
    container_name: identitiesdb
    expose:
      - "3306"
    ports:
      - "3306:3306"
    environment:
      - MARIADB_ROOT_PASSWORD=changeme
      - MARIADB_DATABASE=test_sh
  • setup.cfg
[sortinghat]
database = test_sh

[git]
...
[github]
...

In this case, Mordred will run git or github randomly.

@ajaragz
Copy link

ajaragz commented Jun 24, 2022

I've been able to reproduce the bug in this docker compose deployment. I use the env variable so the identitiesdb container automatically creates the database

services:
   identitiesdb:
      environment:
         - MARIADB_DATABASE=test_sh

It happens both with the identities phase enabled or disabled in the setup.cfg file. Multiple data sources are defined

[phases]
identities = true # or false

[git]

[github]

[github:pull]

[github2:pull]
...

However, only one of these data sources is collected. It changes randomly. It's git in this run

2022-06-24 14:29:57,728 - sirmordred.sirmordred - INFO - ----------------------------
2022-06-24 14:29:57,728 - sirmordred.sirmordred - INFO - Starting SirMordred engine ...
2022-06-24 14:29:57,728 - sirmordred.sirmordred - INFO - - - - - - - - - - - - - - - 
2022-06-24 14:29:58,810 - sirmordred.sirmordred - INFO - Loading projects
2022-06-24 14:29:59,812 - sirmordred.task_projects - INFO - Reading projects data from  /home/bitergia/conf/projects.json 
2022-06-24 14:29:59,813 - sirmordred.sirmordred - INFO - Projects loaded
2022-06-24 14:29:59,848 - sirmordred.sirmordred - INFO - TaskProjects TaskIdentitiesLoad TaskIdentitiesMerge TaskIdentitiesExport will be executed on Fri, 24 Jun 2022 14:31:39 
2022-06-24 14:30:01,377 - sirmordred.task_collection - INFO - [git] collection phase starts
2022-06-24 14:30:01,377 - sirmordred.task_collection - INFO - [git] collection starts for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:02,039 - grimoire_elk.elastic - INFO - Created index https://opensearch-node1:9200/git_raw
2022-06-24 14:30:03,154 - grimoire_elk.elastic - INFO - Alias {'alias': 'git-raw', 'index': 'git_raw'} created on https://opensearch-node1:9200/git_raw.
2022-06-24 14:30:03,156 - grimoire_elk.raw.elastic - INFO - [git] Incremental from: 2021-01-01 00:00:00+00:00 for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:14,665 - grimoire_elk.elk - INFO - [git] Done collection for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:14,667 - sirmordred.task_collection - INFO - [git] collection finished for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:14,667 - sirmordred.task_collection - INFO - [git] collection phase finished in 00:00:13
2022-06-24 14:30:24,871 - sirmordred.task_enrich - INFO - [git] enrichment phase starts
2022-06-24 14:30:25,298 - grimoire_elk.elastic - INFO - Created index https://opensearch-node1:9200/git_enriched
2022-06-24 14:30:25,433 - sirmordred.task_enrich - INFO - [git] enrichment starts for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:25,950 - grimoire_elk.elastic - INFO - Alias {'alias': 'git', 'index': 'git_enriched'} created on https://opensearch-node1:9200/git_enriched.
2022-06-24 14:30:26,298 - grimoire_elk.elastic - INFO - Alias {'alias': 'git_author', 'index': 'git_enriched'} created on https://opensearch-node1:9200/git_enriched.
2022-06-24 14:30:26,596 - grimoire_elk.elastic - INFO - Alias {'alias': 'git_enrich', 'index': 'git_enriched'} created on https://opensearch-node1:9200/git_enriched.
2022-06-24 14:30:26,844 - grimoire_elk.elastic - INFO - Alias {'alias': 'affiliations', 'index': 'git_enriched'} created on https://opensearch-node1:9200/git_enriched.
2022-06-24 14:30:27,105 - grimoire_elk.elastic - INFO - Alias {'alias': 'all_enriched', 'index': 'git_enriched'} created on https://opensearch-node1:9200/git_enriched.
2022-06-24 14:30:41,219 - grimoire_elk.elk - INFO - [git] Done enrichment for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:41,219 - sirmordred.task_enrich - INFO - [git] enrichment finished for https://github.com/chaoss/grimoirelab-perceval.git
2022-06-24 14:30:41,220 - sirmordred.task_enrich - INFO - [git] enrichment phase finished in 0:00:16
2022-06-24 14:30:41,220 - sirmordred.task_enrich - INFO - [git] data retention start
2022-06-24 14:30:41,378 - sirmordred.task_enrich - INFO - [git] data retention end
2022-06-24 14:30:41,379 - sirmordred.task_enrich - INFO - [git] identities retention end
2022-06-24 14:30:41,379 - sirmordred.task_enrich - INFO - [git] autorefresh start
2022-06-24 14:30:41,769 - sirmordred.task_enrich - INFO - [git] Refreshing identities
2022-06-24 14:30:59,219 - sirmordred.task_enrich - INFO - [git] autorefresh end
2022-06-24 14:30:59,220 - sirmordred.task_enrich - INFO - [git] studies phase start
2022-06-24 14:31:01,998 - sirmordred.task_enrich - INFO - [git] Executing studies ['enrich_demography', 'enrich_areas_of_code', 'enrich_onion']

This behavior only occurs in the first mordred run. If the container is restarted, all data sources are collected.

Maybe unrelated, but when this behaviour occurs, the container log shows some pymsql errors

Exception in thread Global tasks:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
Exception in thread github:pull:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
Exception in thread github:repo:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
Exception in thread github:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
Exception in thread github2:pull:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
Exception in thread github2:issue:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 732, in _read_query_result
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 732, in _read_query_result
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 732, in _read_query_result
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 732, in _read_query_result
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    result.read()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 1075, in read
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 1075, in read
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
    result.read()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 684, in _read_packet
    result = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/usr/local/lib/python3.8/site-packages/pymysql/protocol.py", line 220, in check_error
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 684, in _read_packet
    result.read()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 1075, in read
    packet.check_error()
  File "/usr/local/lib/python3.8/site-packages/pymysql/protocol.py", line 220, in check_error
    packet.check_error()
  File "/usr/local/lib/python3.8/site-packages/pymysql/protocol.py", line 220, in check_error
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 684, in _read_packet
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.8/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    conn.query(q)
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in query
    packet.check_error()
  File "/usr/local/lib/python3.8/site-packages/pymysql/protocol.py", line 220, in check_error
    result.read()
  File "/usr/local/lib/python3.8/site-packages/pymysql/connections.py", line 1075, in read
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.8/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.8/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1050, "Table 'organizations' already exists")
...

@guoqiangqi
Copy link

I have the same trouble now, any ideas about avoiding?

@zhquan
Copy link
Member Author

zhquan commented Aug 22, 2022

Hi @guoqiangqi

Sorry for the late reply, You can restart the Mordred container (you will see this error on the very first execution) or let Mordred create the database (change the database name)

These names must be different:

  • setup.cfg
[sortinghat]
database = test_sh
  • MariaDB
      - MARIADB_DATABASE=test_sh

I hope it helps you.

@guoqiangqi
Copy link

Hi @zhquan So appreciate for your advice, i have changed my configuration so the database names in setup.cfg and MariaDB are different, but i got a new Unknown database xx error. My grimoirelab version is 0.4.0.

@guoqiangqi
Copy link

guoqiangqi commented Aug 22, 2022

I was tring to attach my log files but failed, the *.out type is not supported by github, so i put error messages here.

@zhquan
Copy link
Member Author

zhquan commented Aug 22, 2022

@guoqiangqi are you running Mordred using docker or micro.py? (Mordred container should create the database if does not exist).

Anyway, try to create the database but using sortinghat and restart the Mordred container or run again micro.py.

sortinghat -u <USER> -p <PASS> --host <HOST> init demo_sh

@guoqiangqi
Copy link

@zhquan I run Mordred using docker-compose, create the datebase with sortnghat maybe useful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants