Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUL character handling #102

Open
Iced-Sun opened this issue Jul 18, 2019 · 1 comment
Open

NUL character handling #102

Iced-Sun opened this issue Jul 18, 2019 · 1 comment
Assignees

Comments

@Iced-Sun
Copy link

Iced-Sun commented Jul 18, 2019

The current approach to handle ASCII NUL chararacters is to use insert_table_data() instead of copy_data().

I can live with that insert_table_data takes a very long time to complete, as long as there are no other consequences. But apparently, there is an FTWRL (flush table with read lock), which causes the source database blocks all the write operations on the table, effectively makes the db unusable.

I think a possible workaround is to offer an option to let the user choose whether to use insert_table_data. If the user choose to say no, an in-place pre-processing (just like zero timestamps) in generate_select_statements()
could be issued:

			SELECT 
				CASE
					WHEN 
						data_type IN ('"""+"','".join(self.hexify)+"""')
					THEN
						concat('hex(',column_name,')')
					WHEN 
						data_type IN ('bit')
					THEN
						concat('cast(`',column_name,'` AS unsigned)')
					WHEN 
						data_type IN ('datetime','timestamp','date')
					THEN
						concat('nullif(`',column_name,'`,"0000-00-00 00:00:00")')
				ELSE
					concat('cast(replace(`',column_name,'`, char(0), '''') AS char CHARACTER SET """+ self.charset +""")')
				END
				AS select_csv,
				CASE
					WHEN 
						data_type IN ('"""+"','".join(self.hexify)+"""')
					THEN
						concat('hex(',column_name,') AS','`',column_name,'`')
					WHEN 
						data_type IN ('bit')
					THEN
						concat('cast(`',column_name,'` AS unsigned) AS','`',column_name,'`')
					WHEN 
						data_type IN ('datetime','timestamp','date')
					THEN
						concat('nullif(`',column_name,'`,"0000-00-00 00:00:00") AS `',column_name,'`')
					
				ELSE
					concat('cast(replace(`',column_name,'`, char(0), '''') AS char CHARACTER SET """+ self.charset +""") AS','`',column_name,'`')
					
				END
				AS select_stat,
				column_name
			FROM 
				information_schema.COLUMNS 
			WHERE 
				table_schema=%s
				AND 	table_name=%s
			ORDER BY 
				ordinal_position
			;

Regards.

@the4thdoctor
Copy link
Owner

Thanks for pointing out. I'll try to look at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants