You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Drupal allows editors to enter multibyte data in text fields, but the database cannot support 4 byte characters in some tables. Currently we are aware of a single table, search_api_db_content, but an audit will need to be done to discover all utf8 character sets in use.
Both Drupal and Mysql (and by extension, mariaDb) recommend setting up the database to use the utf8mb4_unicode_ci collation, and the utf8mb4 character set across the database. In fact Mysql has deprecated utf8 altogether in favor of ut8mb4.
To Reproduce
Steps to reproduce the behavior:
As an administrator
Go to /media/add/image
In the description, enter: 𝐂𝐚𝐫𝐞𝐠𝐢𝐯𝐞𝐫 𝐒𝐮𝐦𝐦𝐢𝐭 (copy this text, do not type it!)
Save the image after filling required fields.
View the error log at admin/reports/dblog?type%5B%5D=search_api_db (filtered by search_api_db errors)
Note there is an error for the media entity that looks like the below screenshot.
AC / Expected behavior
No errors exist in watchdog when saving entities with 4 byte characters, and the data persists.
When troubleshooting, we noted that the search_api_content table was using the character set utf8 and collation of utf8_general_ci. This character set only allows for a single byte for a character.
Some helpful sql queries
# Get the current character set for the search_api_db_content table.SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.`COLUMNS`WHERE table_schema ="db"AND table_name ="search_api_db_content"AND column_name ="field_description";
# Change the character set (mutates existing data)ALTERTABLE search_api_db_content CONVERT TO CHARACTER SET utf8mb4;
# View server default charset and collationSELECT @@character_set_database, @@collation_database;
# Show the available character sets.
SHOW CHARACTER SET;
The content you are editing has changed. Please copy your edits and refresh the page.
dsasser
changed the title
Stub: search_api_db warning due to character set mismatch
Drupal database does not support multi-byte strings
May 16, 2024
dsasser
changed the title
Drupal database does not support multi-byte strings
Drupal database does not support multibyte characters
May 16, 2024
dsasser
changed the title
Drupal database does not support multibyte characters
Some Drupal database tables do not support 4 byte utf8 characters
May 16, 2024
Describe the defect
Drupal allows editors to enter multibyte data in text fields, but the database cannot support 4 byte characters in some tables. Currently we are aware of a single table,
search_api_db_content
, but an audit will need to be done to discover all utf8 character sets in use.Both Drupal and Mysql (and by extension, mariaDb) recommend setting up the database to use the
utf8mb4_unicode_ci
collation, and theutf8mb4
character set across the database. In fact Mysql has deprecated utf8 altogether in favor of ut8mb4.To Reproduce
Steps to reproduce the behavior:
As an administrator
AC / Expected behavior
No errors exist in watchdog when saving entities with 4 byte characters, and the data persists.
Screenshots
Additional context
This issue impacts following pages:
It's also producing a number of warning log entries every time cron runs:
Conversation around this on Slack.
Engineering Notes
When troubleshooting, we noted that the
search_api_content
table was using the character setutf8
and collation ofutf8_general_ci
. This character set only allows for a single byte for a character.Some helpful sql queries
ACs
The text was updated successfully, but these errors were encountered: