Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bookie query kills mysql server #540

Open
anarcat opened this issue Dec 5, 2014 · 1 comment
Open

bookie query kills mysql server #540

anarcat opened this issue Dec 5, 2014 · 1 comment

Comments

@anarcat
Copy link
Contributor

anarcat commented Dec 5, 2014

we sometimes see the mysql server freak out because of bookie mysql users:

+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| Id    | User   | Host      | db     | Command | Time | State          | Info                                                                                                 |
+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| 43855 | bookie | localhost | bookie | Sleep   |    4 |                | NULL                                                                                                 |
| 86402 | bookie | localhost | bookie | Query   |  119 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (S |
| 86403 | bookie | localhost | bookie | Sleep   |   54 |                | NULL                                                                                                 |
| 87131 | bookie | localhost | bookie | Query   |  103 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('f', '%')) AND (EXISTS (S |
| 87132 | bookie | localhost | bookie | Query   |  291 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('f', '%')) AND (EXISTS (S |
| 87217 | bookie | localhost | bookie | Query   |    0 | NULL           | show processlist                                                                                     |
+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

The MySQL server would completely hog a CPU per such query, during a few minutes.

The complete query is:

SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (SELECT 1
FROM bmark_tags, bmarks
WHERE tags.tid = bmark_tags.tag_id AND bmarks.bid = bmark_tags.bmark_id AND bmarks.bid IN (SELECT bmarks.bid AS bmarks_bid
FROM bmarks
WHERE bmarks.username = 'anarcat' AND (EXISTS (SELECT 1
FROM bmark_tags, tags
WHERE bmarks.bid = bmark_tags.bmark_id AND tags.tid = bmark_tags.tag_id AND tags.tid IN (SELECT tags.tid AS tags_tid
FROM tags
WHERE tags.name IN ('debian', 'o') GROUP BY tags.tid))) GROUP BY bmarks.bid)))

EXPLAIN doesn't outline anything very obvious, other than filesort issues, which maybe means we need to ramp up the sort_buffer variable...

mysql> explain SELECT DISTINCT tags.name AS name FROM tags WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (SELECT 1 FROM bmark_tags, bmarks WHERE tags.tid = bmark_tags.tag_id AND bmarks.bid = bmark_tags.bmark_id AND bmarks.bid IN (SELECT bmarks.bid AS bmarks_bid FROM bmarks WHERE bmarks.username = 'anarcat' AND (EXISTS (SELECT 1 FROM bmark_tags, tags WHERE bmarks.bid = bmark_tags.bmark_id AND tags.tid = bmark_tags.tag_id AND tags.tid IN (SELECT tags.tid AS tags_tid FROM tags WHERE tags.name IN ('debian', 'o') GROUP BY tags.tid))) GROUP BY bmarks.bid)));
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
| id | select_type        | table      | type   | possible_keys  | key      | key_len | ref                        | rows | Extra                                    |
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
|  1 | PRIMARY            | tags       | range  | name           | name     | 258     | NULL                       |   77 | Using where; Using index                 |
|  2 | DEPENDENT SUBQUERY | bmark_tags | ref    | PRIMARY,tag_id | tag_id   | 4       | bookie.tags.tid            |    9 | Using where; Using index                 |
|  2 | DEPENDENT SUBQUERY | bmarks     | eq_ref | PRIMARY        | PRIMARY  | 4       | bookie.bmark_tags.bmark_id |    1 | Using index                              |
|  3 | DEPENDENT SUBQUERY | bmarks     | ref    | username       | username | 257     | const                      | 2811 | Using where; Using index; Using filesort |
|  4 | DEPENDENT SUBQUERY | bmark_tags | ref    | PRIMARY,tag_id | PRIMARY  | 4       | bookie.bmarks.bid          |    1 | Using where; Using index                 |
|  4 | DEPENDENT SUBQUERY | tags       | eq_ref | PRIMARY        | PRIMARY  | 4       | bookie.bmark_tags.tag_id   |    1 | Using index                              |
|  5 | DEPENDENT SUBQUERY | tags       | range  | name           | name     | 258     | NULL                       |    2 | Using where; Using index; Using filesort |
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
7 rows in set (0.00 sec)

I was wondering if this query shouldn't be optimised in some way... That sure looks like a lot of subqueries...

@anarcat
Copy link
Contributor Author

anarcat commented Dec 5, 2014

i am not sure, but this certainly looks like a degenerate case of the completion code in bookie.models.TagMgr.complete(). i wonder if we shouldn't put a minimum on the size of completion requests. for example, it looks like in the above query, it looks at all bookmarks that match o or debian. I believe this happened because i typed o rganisation and bookie parsed that as two tags, then submitted that to the completion engine trying to complete rganisation.

there are multiple ways of doing this. one is in the UI:

--- a/bookie/static/js/bookie/tagcontrol.js
+++ b/bookie/static/js/bookie/tagcontrol.js
@@ -417,6 +417,10 @@ YUI.add('bookie-tagcontrol', function (Y) {
          */
         _fetch_suggestions: function (qry, callback) {
             var tags;
+            // don't autcomplete short terms which may overload the database
+            if (qry.length <= 3) {
+                return '';
+            }
             this.ac.api = new Y.bookie.Api.route.TagComplete(
                 this.get('api_cfg')
             );

... but i don't think that's the right approach, personnally, because a malicious API user could still blow up the database. besides, the above patch only limits the size of rganisation and not o in the above example.

an alternative would be in the API itself:

diff --git a/bookie/views/api.py b/bookie/views/api.py
index 17c1172..54aebf7 100644
--- a/bookie/views/api.py
+++ b/bookie/views/api.py
@@ -624,11 +624,11 @@ def tag_complete(request):
         return _api_response(request, {})

     if 'current' in params and params['current'] != "":
-        current_tags = params['current'].split()
+        current_tags = [ x for x in params['current'].split() if len(x) > 3 ]
     else:
         current_tags = None

-    if 'tag' in params and params['tag']:
+    if 'tag' in params and params['tag'] and len(params['tag']) > 3:
         tag = params['tag']

         tags = TagMgr.complete(tag,

I would suggest working on the latter, the number 3 being arbitrary. It should obviously be a config variable of some sort, but before digging any deeper, i'd like to hear some feedback.

We'll try using the latter in production now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant