Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MariaDBclient + Mysql8 support #30

Open
wants to merge 6 commits into
base: 0.2.x_force_latin1_to_utf8
Choose a base branch
from

Conversation

intrip
Copy link
Member

@intrip intrip commented Aug 2, 2023

Based on brianmario#976

  • Add a check to avoid segfault: Triggered by out-of-bound array access when retrieving the Mysql encoding.
  • Update support scripts:
    • Port new mappings from latest master
    • Add utf8mb3 -> utf_8 missing mapping (thanks @jeremy 🙏)
    • rebuild ext/mysql2/mysql_enc_to_ruby.h and ext/mysql2/mysql_enc_name_to_ruby.h from Mysql 8.0.32

This PR encompasses also @lewispb work done in #28

jconroy77 and others added 5 commits March 29, 2023 09:09
Unix systems using libtool do not need to do a version check against the
client version string as the libraries themselves are versioned.
libmariadb-client-lgpl-dev in newly released Debian stable (jessie)
ships `/usr/bin/mariadb_config`.
Triggered by out of bound array access when retreiving the mysql encoding.
@intrip intrip requested review from jeremy and lewispb August 2, 2023 08:47
@intrip intrip changed the title Mariadbclient mysql8 Mariadbclient + Mysql8 Aug 2, 2023
@intrip intrip changed the title Mariadbclient + Mysql8 Mariadbclient + Mysql8 support Aug 2, 2023
@intrip intrip changed the title Mariadbclient + Mysql8 support MariaDBclient + Mysql8 support Aug 2, 2023
@jeremy
Copy link
Member

jeremy commented Aug 2, 2023

Mysql8 has a different mapping from Mysql5.7 so we can't merge it directly in 0.2.x_force_latin1_to_utf8.

How does the mapping differ? How does mainline mysql2 handle the incompatibility?

@jeremy
Copy link
Member

jeremy commented Aug 2, 2023

I pulled collations from MySQL 5.7 and 8 locally, diffed them, and they appear to be fully compatible EXCEPT for utf8/utf8mb3 naming changes. The collation IDs are stable across MySQL versions and the mapping scripts are sorted by ID, so collations don't collide.

The utf8 naming was used in MySQL <8 and MariaDB <10.6, but it was always an alias for the underlying utf8mb3. Both dbs switched to utf8mb3 naming in anticipation of switching the utf8 alias to utf8mb4 in the future.

So we'll need mysql2 support for utf8mb3 charset and collation names. Open issue here: brianmario#1315

We can do that by updating the charset map generation scripts to account for the alias and map both utf8 and utf8mb3 names to Ruby utf-8.

Here's a rough diff:

--- mysql57_coll	2023-08-02 11:41:50
+++ mysql8_coll	2023-08-02 11:42:01
@@ -29,7 +29,7 @@
  [30, "latin5_turkish_ci"],
  [31, "latin1_german2_ci"],
  [32, "armscii8_general_ci"],
- [33, "utf8_general_ci"],
+ [33, "utf8mb3_general_ci"],
  [34, "cp1250_czech_cs"],
  [35, "ucs2_general_ci"],
  [36, "cp866_general_ci"],
@@ -72,13 +72,14 @@
  [73, "keybcs2_bin"],
  [74, "koi8r_bin"],
  [75, "koi8u_bin"],
+ [76, "utf8mb3_tolower_ci"],
  [77, "latin2_bin"],
  [78, "latin5_bin"],
  [79, "latin7_bin"],
  [80, "cp850_bin"],
  [81, "cp852_bin"],
  [82, "swe7_bin"],
- [83, "utf8_bin"],
+ [83, "utf8mb3_bin"],
  [84, "big5_bin"],
  [85, "euckr_bin"],
  [86, "gb2312_bin"],
@@ -168,31 +169,31 @@
  [181, "utf32_croatian_ci"],
  [182, "utf32_unicode_520_ci"],
  [183, "utf32_vietnamese_ci"],
- [192, "utf8_unicode_ci"],
- [193, "utf8_icelandic_ci"],
- [194, "utf8_latvian_ci"],
- [195, "utf8_romanian_ci"],
- [196, "utf8_slovenian_ci"],
- [197, "utf8_polish_ci"],
- [198, "utf8_estonian_ci"],
- [199, "utf8_spanish_ci"],
- [200, "utf8_swedish_ci"],
- [201, "utf8_turkish_ci"],
- [202, "utf8_czech_ci"],
- [203, "utf8_danish_ci"],
- [204, "utf8_lithuanian_ci"],
- [205, "utf8_slovak_ci"],
- [206, "utf8_spanish2_ci"],
- [207, "utf8_roman_ci"],
- [208, "utf8_persian_ci"],
- [209, "utf8_esperanto_ci"],
- [210, "utf8_hungarian_ci"],
- [211, "utf8_sinhala_ci"],
- [212, "utf8_german2_ci"],
- [213, "utf8_croatian_ci"],
- [214, "utf8_unicode_520_ci"],
- [215, "utf8_vietnamese_ci"],
- [223, "utf8_general_mysql500_ci"],
+ [192, "utf8mb3_unicode_ci"],
+ [193, "utf8mb3_icelandic_ci"],
+ [194, "utf8mb3_latvian_ci"],
+ [195, "utf8mb3_romanian_ci"],
+ [196, "utf8mb3_slovenian_ci"],
+ [197, "utf8mb3_polish_ci"],
+ [198, "utf8mb3_estonian_ci"],
+ [199, "utf8mb3_spanish_ci"],
+ [200, "utf8mb3_swedish_ci"],
+ [201, "utf8mb3_turkish_ci"],
+ [202, "utf8mb3_czech_ci"],
+ [203, "utf8mb3_danish_ci"],
+ [204, "utf8mb3_lithuanian_ci"],
+ [205, "utf8mb3_slovak_ci"],
+ [206, "utf8mb3_spanish2_ci"],
+ [207, "utf8mb3_roman_ci"],
+ [208, "utf8mb3_persian_ci"],
+ [209, "utf8mb3_esperanto_ci"],
+ [210, "utf8mb3_hungarian_ci"],
+ [211, "utf8mb3_sinhala_ci"],
+ [212, "utf8mb3_german2_ci"],
+ [213, "utf8mb3_croatian_ci"],
+ [214, "utf8mb3_unicode_520_ci"],
+ [215, "utf8mb3_vietnamese_ci"],
+ [223, "utf8mb3_general_mysql500_ci"],
  [224, "utf8mb4_unicode_ci"],
  [225, "utf8mb4_icelandic_ci"],
  [226, "utf8mb4_latvian_ci"],
@@ -219,4 +220,67 @@
  [247, "utf8mb4_vietnamese_ci"],
  [248, "gb18030_chinese_ci"],
  [249, "gb18030_bin"],
- [250, "gb18030_unicode_520_ci"]]
+ [250, "gb18030_unicode_520_ci"],
+ [255, "utf8mb4_0900_ai_ci"],
+ [256, "utf8mb4_de_pb_0900_ai_ci"],
+ [257, "utf8mb4_is_0900_ai_ci"],
+ [258, "utf8mb4_lv_0900_ai_ci"],
+ [259, "utf8mb4_ro_0900_ai_ci"],
+ [260, "utf8mb4_sl_0900_ai_ci"],
+ [261, "utf8mb4_pl_0900_ai_ci"],
+ [262, "utf8mb4_et_0900_ai_ci"],
+ [263, "utf8mb4_es_0900_ai_ci"],
+ [264, "utf8mb4_sv_0900_ai_ci"],
+ [265, "utf8mb4_tr_0900_ai_ci"],
+ [266, "utf8mb4_cs_0900_ai_ci"],
+ [267, "utf8mb4_da_0900_ai_ci"],
+ [268, "utf8mb4_lt_0900_ai_ci"],
+ [269, "utf8mb4_sk_0900_ai_ci"],
+ [270, "utf8mb4_es_trad_0900_ai_ci"],
+ [271, "utf8mb4_la_0900_ai_ci"],
+ [273, "utf8mb4_eo_0900_ai_ci"],
+ [274, "utf8mb4_hu_0900_ai_ci"],
+ [275, "utf8mb4_hr_0900_ai_ci"],
+ [277, "utf8mb4_vi_0900_ai_ci"],
+ [278, "utf8mb4_0900_as_cs"],
+ [279, "utf8mb4_de_pb_0900_as_cs"],
+ [280, "utf8mb4_is_0900_as_cs"],
+ [281, "utf8mb4_lv_0900_as_cs"],
+ [282, "utf8mb4_ro_0900_as_cs"],
+ [283, "utf8mb4_sl_0900_as_cs"],
+ [284, "utf8mb4_pl_0900_as_cs"],
+ [285, "utf8mb4_et_0900_as_cs"],
+ [286, "utf8mb4_es_0900_as_cs"],
+ [287, "utf8mb4_sv_0900_as_cs"],
+ [288, "utf8mb4_tr_0900_as_cs"],
+ [289, "utf8mb4_cs_0900_as_cs"],
+ [290, "utf8mb4_da_0900_as_cs"],
+ [291, "utf8mb4_lt_0900_as_cs"],
+ [292, "utf8mb4_sk_0900_as_cs"],
+ [293, "utf8mb4_es_trad_0900_as_cs"],
+ [294, "utf8mb4_la_0900_as_cs"],
+ [296, "utf8mb4_eo_0900_as_cs"],
+ [297, "utf8mb4_hu_0900_as_cs"],
+ [298, "utf8mb4_hr_0900_as_cs"],
+ [300, "utf8mb4_vi_0900_as_cs"],
+ [303, "utf8mb4_ja_0900_as_cs"],
+ [304, "utf8mb4_ja_0900_as_cs_ks"],
+ [305, "utf8mb4_0900_as_ci"],
+ [306, "utf8mb4_ru_0900_ai_ci"],
+ [307, "utf8mb4_ru_0900_as_cs"],
+ [308, "utf8mb4_zh_0900_as_cs"],
+ [309, "utf8mb4_0900_bin"],
+ [310, "utf8mb4_nb_0900_ai_ci"],
+ [311, "utf8mb4_nb_0900_as_cs"],
+ [312, "utf8mb4_nn_0900_ai_ci"],
+ [313, "utf8mb4_nn_0900_as_cs"],
+ [314, "utf8mb4_sr_latn_0900_ai_ci"],
+ [315, "utf8mb4_sr_latn_0900_as_cs"],
+ [316, "utf8mb4_bs_0900_ai_ci"],
+ [317, "utf8mb4_bs_0900_as_cs"],
+ [318, "utf8mb4_bg_0900_ai_ci"],
+ [319, "utf8mb4_bg_0900_as_cs"],
+ [320, "utf8mb4_gl_0900_ai_ci"],
+ [321, "utf8mb4_gl_0900_as_cs"],
+ [322, "utf8mb4_mn_cyrl_0900_ai_ci"],
+ [323, "utf8mb4_mn_cyrl_0900_as_cs"]]

@jeremy
Copy link
Member

jeremy commented Aug 2, 2023

Ah: mysql2 looks up charsets by ID, not name, so the alias naming shouldn't even matter! Adding a utf8mb3 → utf-8 mapping may be sufficient.

@jeremy
Copy link
Member

jeremy commented Aug 2, 2023

Done upstream: brianmario#1323

@intrip intrip force-pushed the mariadbclient_mysql8 branch from 290c9f4 to 12cea12 Compare August 3, 2023 07:49
@intrip intrip changed the base branch from 0.2.x_force_latin1_to_utf8_mysql8 to 0.2.x_force_latin1_to_utf8 August 3, 2023 07:52
@intrip intrip force-pushed the mariadbclient_mysql8 branch 2 times, most recently from 6a8a4b1 to fffaa42 Compare August 3, 2023 08:06
- Port new mappings from latest master
- Add utf8mb3 -> utf_8 missing mapping
- rebuild ext/mysql2/mysql_enc_to_ruby.h and ext/mysql2/mysql_enc_name_to_ruby.h from Mysql 8.0.32
@intrip intrip force-pushed the mariadbclient_mysql8 branch from fffaa42 to b8ab80d Compare August 3, 2023 08:43
@intrip
Copy link
Member Author

intrip commented Aug 3, 2023

Thank you @jeremy for the in-depth review and upstream fix 🙇‍♂️.
I've added the map from utf8mb3 => UTF-8 in this PR and regenerated ext/mysql2/mysql_enc_to_ruby.h and ext/mysql2/mysql_enc_name_to_ruby.h from Mysql 8.0.32.

I've tested setting the encoding of utf8mb3 in the connection but unfortunately, it breaks with this error:

>> client = Mysql2::Client.new( {:adapter=>"mysql2", :encoding=>"utf8mb3", :database=>"highrise_development", :host=>"db", :username=>"root"})
Traceback (most recent call last):
        5: from /usr/local/bin/irb:11:in `<main>'
        4: from (irb):7
        3: from (irb):7:in `new'
        2: from /usr/local/bundle/ruby/2.5.0/bundler/gems/mysql2-b8ab80dc57e8/lib/mysql2/client.rb:70:in `initialize'
        1: from /usr/local/bundle/ruby/2.5.0/bundler/gems/mysql2-b8ab80dc57e8/lib/mysql2/client.rb:70:in `connect'
Mysql2::Error (Can't initialize character set utf8mb3 (path: compiled_in))

So something goes wrong initializing the connection, as the next step I'd enable verbose logging in MYSQL to see what's going on but I wonder if I'm doing something dumb and you already know it 😅.
This is not a blocker so I'm pausing for a bit to move forward with the other Ruby upgrade work but I'll come back here later on.

@jeremy
Copy link
Member

jeremy commented Aug 3, 2023

Looks good @intrip! No need to switch to utf8mb3 naming, fwiw. The mysql 5.7 client library doesn't know about that charset name.

@jeremy
Copy link
Member

jeremy commented Aug 3, 2023

Would you backport to the 0.3.x_force_latin1_to_utf8 branch as well?

@intrip
Copy link
Member Author

intrip commented Aug 21, 2023

Looks good @intrip! No need to switch to utf8mb3 naming, fwiw. The mysql 5.7 client library doesn't know about that charset name.

Ahh, right! 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants