MySQL - Prepping MySQL Indexes for a Character Set Change
When changing a MySQL table or column to a new character set that uses more bytes than the old character set, you need to first check if any schema changes are needed to accommodate the change. For example, changing character sets from latin1 to utf8 is an increase from 1 to 3 bytes, and changing from utf8 to utf8mb4 is an increase from 3 to 4 bytes. The MySQL reference manual has a helpful page with some details on this, but I want to add some examples to show how this schema prep can be accomplished.
There are three different types of length limits to take into consideration:
- Index
- Column
- Row
In this post I will focus on index length limits, and I’ll save columns and rows for future posts. Read on for details.
Index Length Limits
The specific limits depend on which MySQL version and storage engine is used, and whether innodb_large_prefix is enabled (assuming MySQL 5.5 or higher). If you want to learn from about innodb_large_prefix you may want to read this post. A common example here is an index on a varchar(255) column in the utf8 character set without innodb_large_prefix. That index uses 765+2=767 bytes, so it conforms to the 767 byte limit per column of an InnoDB index without innodb_large_prefix. If I change that column to utf8mb4 then the index requires 1022 bytes, which causes the DDL to fail.
Here’s an example:
mysql> create table if not exists varchar_test ( -> id int auto_increment primary key, -> str varchar(255) -> ) engine = InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.08 sec) mysql> alter table varchar_test add index str_index (str); Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table varchar_test DEFAULT CHARSET=utf8mb4; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table varchar_test modify column str varchar(255) CHARACTER SET utf8mb4; ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.
For the record, here are the index length limits I need to worry about in a character set change:
- InnoDB indexes are limited to 3072 bytes
- MyISAM indexes are limited to 1000 bytes
- Without innodb_large_prefix, InnoDB index columns are limited to 767 bytes
In order to identify all of the indexes that I need to change, I run the following two queries on information_schema. (I tried to add the correct size for all relevant data types, but in case I forgot a data type or a new data type is added I put in a large default value to return false positives so I can go in and fix the query). These queries can be easily modified for a change to another character set such as utf8, utf16, utf32, etc.
Query #1:
-- Show me all index columns that will exceed the length limit if I change to utf8mb4 set @new_character_set = 'utf8mb4'; select table_schema, table_name, engine, index_name, column_name, column_type, sub_part, index_column_length_in_bytes from ( select t.table_Schema,t.table_name,t.engine,s.index_name,c.column_name, c.column_type,c.data_type,c.character_maximum_length,c.character_octet_length, s.sub_part,c.character_set_name,cs.maxlen, coalesce( ( case when s.sub_part is null then (cs.maxlen * c.character_maximum_length) else (cs.maxlen * s.sub_part) end ), (case (c.data_type) when 'tinyint' then 1 when 'smallint' then 2 when 'mediumint' then 3 when 'int' then 4 when 'bigint' then 8 when 'decimal' then 4 when 'float' then 4 when 'year' then 1 when 'date' then 3 when 'datetime' then 8 when 'time' then 3 when 'timestamp' then 4 else 1000000 end)) as index_column_length_in_bytes from information_schema.statistics s inner join information_schema.columns c on c.table_schema = s.table_schema and c.table_name = s.table_name and c.column_name = s.column_name inner join information_schema.tables t on t.table_schema = c.table_schema and t.table_name = c.table_name inner join information_schema.character_sets cs on cs.character_set_name = @new_character_set where s.index_type != 'FULLTEXT' ) sub_query where index_column_length_in_bytes > ( select case when engine = 'MyISAM' then 1000 when engine = 'InnoDB' and (select max(innodb_large_prefix) from ( select variable_value='ON' as innodb_large_prefix from information_schema.global_variables where variable_name = 'innodb_large_prefix' union select 0 as innodb_large_prefix ) sub_query) = 1 then 3072 else 767 end ) order by table_schema, table_name, index_name;
Query #2:
-- show me all indexes that will exceed the length limit if I change to utf8mb4 set @new_character_set = 'utf8mb4'; select table_schema, table_name, engine, index_name, columns_in_index, index_length_in_bytes from ( select t.table_schema,t.table_name,t.engine,s.index_name, group_concat(concat(s.column_name, case when s.sub_part is null then '' else concat('(',s.sub_part,')') end ) order by s.seq_in_index) as columns_in_index, sum(coalesce( ( case when s.sub_part is null then (cs.maxlen * c.character_maximum_length) else (cs.maxlen * s.sub_part) end ), ( case (c.data_type) when 'tinyint' then 1 when 'smallint' then 2 when 'mediumint' then 3 when 'int' then 4 when 'bigint' then 8 when 'decimal' then 4 when 'float' then 4 when 'year' then 1 when 'date' then 3 when 'datetime' then 8 when 'time' then 3 when 'timestamp' then 4 else 1000000 end ) )) as index_length_in_bytes from information_schema.statistics s inner join information_schema.columns c on c.table_schema = s.table_schema and c.table_name = s.table_name and c.column_name = s.column_name inner join information_schema.tables t on t.table_schema = c.table_schema and t.table_name = c.table_name inner join information_schema.character_sets cs on cs.character_set_name = @new_character_set where s.index_type != 'FULLTEXT' group by t.table_schema,t.table_name,t.engine,s.index_name ) sub_query where index_length_in_bytes > ( select case when engine = 'MyISAM' then 1000 else 3072 end ) order by table_schema, table_name, index_name;
Once I identify the affected indexes, I need to resolve each one either by shortening the sub-part of one or more columns in the index, removing columns from the index, decreasing the length of the column, or deciding not to change the character set of the relevant tables/columns.
NOTE: In MySQL 5.7, innodb_large_prefix + row format=DYNAMIC is enabled by default, so this should hopefully become a thing of the past.