Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #9 seems unable to properly decode utf8mb4 #47

Open
matt-matt1 opened this issue Dec 13, 2024 · 9 comments
Open

Issue #9 seems unable to properly decode utf8mb4 #47

matt-matt1 opened this issue Dec 13, 2024 · 9 comments

Comments

@matt-matt1
Copy link

Hi
First attempt:

python3 main.py datatypes.ibd --sql --ddl > datatypes.sql
ibd2sql/innodb_type.py:62: SyntaxWarning: invalid escape sequence '('
size = int(re.compile('binary((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:70: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('float((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:89: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('bit((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:95: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('timestamp((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:101: SyntaxWarning: invalid escape sequence '('
ext = ext = int(re.compile('datetime((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:107: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('time((.+))').findall(col['column_type_utf8'],)[0])
ibd2sql/innodb_type.py:113: SyntaxWarning: invalid escape sequence '('
total_digits, decimal_digits = re.compile('decimal((.+))').findall(col['column_type_utf8'],)[0].split(',')
ibd2sql/innodb_type.py:149: SyntaxWarning: invalid escape sequence '('
size = int(re.compile('char((.+))').findall(col['column_type_utf8'],)[0]) # issue 9

cat datatypes.sql
CREATE TABLE IF NOT EXISTS workout.datatypes(
id int NOT NULL AUTO_INCREMENT,
type varchar(50) NOT NULL,
fallback varchar(50) NOT NULL,
min int NULL,
max int NULL,
min_length int NULL,
max_length int NULL,
PRIMARY KEY (id )
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ;
INSERT INTO workout.datatypes VALUES (1, 'tex', '0x74800000', -2122317824, 2131035660, -2109734816, -2144436224);
INSERT INTO workout.datatypes VALUES (2, 'switchcheckb', '0x6f7880000000', 1, -2046424062, -1073723392, -1367343104);
INSERT INTO workout.datatypes VALUES (3, 'minutenumber', '0x800000008000', -2143615482, -1946008576, -805284736, -2147482624);
INSERT INTO workout.datatypes VALUES (4, 'secondnumber', '0x800000008000', -2143617020, -2097004544, -654347136, -2147483392);
INSERT INTO workout.datatypes VALUES (5, '0x736c6964657272616e676580', '0x000000800000', -2063268349, -1946009600, -385941888, -2147482112);
INSERT INTO workout.datatypes VALUES (6, '0x696e746e756d626572800000', '0x008000', -2082013184, -2147483648, -2147483648, -2147483648);
INSERT INTO workout.datatypes VALUES (7, '0x756e7573656400ff08060c02400040', '0x003080000002', -2147483648, 250806528, -2147419644, 393443177);
INSERT INTO workout.datatypes VALUES (8, 'colornumber\x00\x06\x0f\x02', '0x400038ffdb', 7, -2147483648, 1515618816, -2147446783);

Everything is fine unto 'x' in line INSERT INTO workout.datatypes VALUES (1, 'tex
data is wrong for this column and all following columns in every insert row. (First column seems OK)

Then I tried with --debug , some output: (I think the error is in here)
...
[2024-12-13 13:12:48] [DEBUG] ROW VERSION : 2
[2024-12-13 13:12:48] [DEBUG] INSTANT FLAG : False
[2024-12-13 13:12:48] [DEBUG] ROW VERSION FLAG : True
[2024-12-13 13:12:48] [DEBUG] NAME: type VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:2
[2024-12-13 13:12:48] [DEBUG] 489 ----> 492 data:tex bdata:b'tex'
[2024-12-13 13:12:48] [DEBUG] ######## DDCW FLAG 2 ########
[2024-12-13 13:12:48] [DEBUG] NAME: fallback VERSION_ADDED:1 VERSION_DROPED:0 COL_INSTANT:True ROW VERSION:2
[2024-12-13 13:12:48] [DEBUG] BLOB ERROR 'utf-8' codec can't decode byte 0x80 in position 1: invalid start byte
[2024-12-13 13:12:48] [DEBUG] 492 ----> 496 data:0x74800000 bdata:b't\x80\x00\x00'
[2024-12-13 13:12:48] [DEBUG] ######## DDCW FLAG 2 ########

...

Contact me for more information.

@ddcw
Copy link
Owner

ddcw commented Dec 14, 2024

Thank you for your feedback,

From the logs, it seems that fallback was caused by ONLINE DDL,

Can you provide the MySQL version? It would be even better if you could provide the specific IBD file

Edit the ibd2sql/innodb_page_index.py file and add _ = self._read_innodb_varsize()at line 439 may be able to solve this problem.

@matt-matt1
Copy link
Author

matt-matt1 commented Dec 14, 2024 via email

@ddcw
Copy link
Owner

ddcw commented Dec 16, 2024

I tried to simulate it with mysql-9.0.1, but couldn't reproduce it.
Please provide the complete DDL addition record, or provide information of ibd2sdi workout.datatypes.ibd.

Based on the log above, it is speculated that the first data is (1, text, ?, 1, 255, -2063201025 or null, -2063201278 or null)

@matt-matt1
Copy link
Author

Hi
I not sure what you'd like me provide, a file? Did you get the datatypes_ibd.zip file attached to my last message?
From what I remember the correct row should be something like (1, text, 1, 255, null, null)

@ddcw
Copy link
Owner

ddcw commented Dec 17, 2024

Thank you for providing the verification data.

It has been fixed. Please download the latest version for verification

https://github.com/ddcw/ibd2sql/archive/refs/heads/main.zip

@matt-matt1
Copy link
Author

Good work. I clicked on the link and then did:

python3 main.py ../workout/datatypes.ibd --sql --ddl > ../workout/datatypes3.sql
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:62: SyntaxWarning: invalid escape sequence '('
size = int(re.compile('binary((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:70: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('float((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:89: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('bit((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:95: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('timestamp((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:101: SyntaxWarning: invalid escape sequence '('
ext = ext = int(re.compile('datetime((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:107: SyntaxWarning: invalid escape sequence '('
ext = int(re.compile('time((.+))').findall(col['column_type_utf8'],)[0])
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:113: SyntaxWarning: invalid escape sequence '('
total_digits, decimal_digits = re.compile('decimal((.+))').findall(col['column_type_utf8'],)[0].split(',')
/usr/local/mysql.bak/data/ibd2sql-main/ibd2sql/innodb_type.py:149: SyntaxWarning: invalid escape sequence '('
size = int(re.compile('char((.+))').findall(col['column_type_utf8'],)[0]) # issue 9

Same error it seems. So it must of downloaded the same version again! How can I download it properly?

@ddcw
Copy link
Owner

ddcw commented Dec 17, 2024

This is warning, not error. It should be an alarm in the Python of macOS. which does not affect data parsing.

  1. You can add 2>/dev/null to ignore it.

  2. or python3 -Wall main.py workout/datatypes.ibd --sql to ignore it.

  3. or add r to a string in a regular expression. like 'char\((.+)\)' --> r'char\((.+)\)' , it's also possible to solve it.

No similar warning were found in the Linux or Windows environment.

@ddcw
Copy link
Owner

ddcw commented Dec 17, 2024

I fixed the WARNING using the third method, and you can download the latest version again to try it

@matt-matt1
Copy link
Author

git clone https://github.com/ddcw/ibd2sql/archive/refs/heads/main.zip
Cloning into 'main.zip'...
remote: Not Found
fatal: repository 'https://github.com/ddcw/ibd2sql/archive/refs/heads/main.zip/' not found

Anyway that translated well and I could import into Mysql OK! Thanks.

I tried to import the settings table (also convert from .ibd -> sql - using your script):

CREATE TABLE `settings` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `value` varchar(255) NOT NULL,
  `datatype` int DEFAULT '0',
  `tab_id` int NOT NULL DEFAULT '0',
  `override` varchar(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `tab_id` (`tab_id`),
  KEY `FK_datatype` (`datatype`),
  CONSTRAINT `FK_datatype` FOREIGN KEY (`id`) REFERENCES `datatypes` (`id`),
  CONSTRAINT `FK_tab` FOREIGN KEY (`id`) REFERENCES `settings_tabs` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

(some INSERT rows gave errors - only the first 4 worked)
I think the constraints are incorrect. I believe that settings.datatype ==> datatypes.id, so shouldn't it be CONSTRAINT FK_datatype FOREIGN KEY (**datatype**) REFERENCES datatypes (id), ? (I'm not certain with mysql foreign keys..., so perhaps it's right??)
Ask if you want the associated .ibd file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants