原创,如果有误请指出
今天同事讨论关于主键使用varchar和int的区别。
我现在总结的3个问题:
1、tablespace中空间浪费
当然我们知道使用varchar可能会导致辅助索引比较大,因为用到varchar可能存储的字符较多,同时
在行头也存在一个可变字段字符区域(1-2)字节
而辅助索引叶子结点毕竟都存储了主键值,这样至少会多varchar数据字节数量+1(或者2) 字节- 4(int)字节空间。
如果辅助索引比较多空间浪费是可想而知的。
2、辅助索引B+树扫描性能
由于辅助索引B+树的空间要求更大,虽然在B+树层次一般都是3层-4层,索引单值定位I/O消耗并不明显,如果涉及到
范围查询(比如PAGE_CUR_G),需要访问的块就更多,同时比如例如辅助索引的using index,需要访问的块自然
更多
3、比较更加复杂
innodb 在进行元组比较的时候,不管是DML,select都会涉及到元组的比较,同时回表的时候也涉及
到比较操作。而varchar类型的比较比int类型更为复杂一些。
那么我们就来分析第三个问题,第一个问题和第二个问题是显而易见的。
我这里数据库字符集为latin1/latin1_swedish_ci
其实在innodb底层进行比较的时候都调用cmp_data这个函数
在innodb中有自己的定义的数据类型如下:
-
/*-------------------------------------------*/
-
/* The 'MAIN TYPE' of a column */
-
#define DATA_MISSING 0 /* missing column */
-
#define DATA_VARCHAR 1 /* character varying of the
-
latin1_swedish_ci charset-collation; note
-
that the MySQL format for this, DATA_BINARY,
-
DATA_VARMYSQL, is also affected by whether the
-
'precise type' contains
-
DATA_MYSQL_TRUE_VARCHAR */
-
#define DATA_CHAR 2 /* fixed length character of the
-
latin1_swedish_ci charset-collation */
-
#define DATA_FIXBINARY 3 /* binary string of fixed length */
-
#define DATA_BINARY 4 /* binary string */
-
#define DATA_BLOB 5 /* binary large object, or a TEXT type;
-
if prtype & DATA_BINARY_TYPE == 0, then this is
-
actually a TEXT column (or a BLOB created
-
with < 4.0.14; since column prefix indexes
-
came only in 4.0.14, the missing flag in BLOBs
-
created before that does not cause any harm) */
-
#define DATA_INT 6 /* integer: can be any size 1 - 8 bytes */
-
#define DATA_SYS_CHILD 7 /* address of the child page in node pointer */
-
#define DATA_SYS 8 /* system column */
我们熟悉的int类型属于DATA_INT而varchar属于DATA_VARCHAR,rowid属于DATA_SYS
在函数cmp_data根据各种类型的不同进行了不同比较的方式,这里就将int和varchar
判断的方式进行说明:
1、innodb int类型比较
实际上是在cmp_data中进行了大概的方式如下
-
if (len) {
-
#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
-
/* Compare the first bytes with a loop to avoid the call
-
overhead of memcmp(). On x86 and x86-64, the GCC built-in
-
(repz cmpsb) seems to be very slow, so we will be calling the
-
libc version. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-
tracks the slowness of the GCC built-in memcmp().
-
-
-
We compare up to the first 4..7 bytes with the loop.
-
The (len & 3) is used for "normalizing" or
-
"quantizing" the len parameter for the memcmp() call,
-
in case the whole prefix is equal. On x86 and x86-64,
-
the GNU libc memcmp() of equal strings is faster with
-
len=4 than with len=3.
-
-
-
On other architectures than the IA32 or AMD64, there could
-
be a built-in memcmp() that is faster than the loop.
-
We only use the loop where we know that it can improve
-
the performance. */
-
for (ulint i = 4 + (len & 3); i > 0; i--) {
-
cmp = int(*data1++) - int(*data2++);
-
if (cmp) {
-
return(cmp);
-
}
-
-
-
if (!--len) {
-
break;
-
}
-
}
-
my_strnncollsp_simple
-
-
-
if (len) {
-
#endif /* IA32 or AMD64 */
-
cmp = memcmp(data1, data2, len);
-
-
-
if (cmp) {
-
return(cmp);
-
}
-
-
-
data1 += len;
-
data2 += len;
-
#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
-
}
-
#endif /* IA32 or AMD64 */
-
}
-
-
-
cmp = (int) (len1 - len2);
-
-
-
if (!cmp || pad == ULINT_UNDEFINED) {
-
return(cmp);
-
}
可以看到整个方式比较简洁,对于我们常用的x86_64模型并没有直接使用memcpy进行而是
进行了优化在注释中也有说明,才出现了for (ulint i = 4 + (len & 3); i > 0; i--)
部分,如果是IA32 or AMD64则直接使用memcpy进行比较。感兴趣的可以仔细阅读一下
2、innodb varchar类型比较
实际上这个比较会通过cmp_data->cmp_whole_field->my_strnncollsp_simple调用最终调用
my_strnncollsp_simple完成,而比如order by 会调用my_strnxfrm_simple他们都在一个
文件中。
下面是整个my_strnncollsp_simple函数
-
/*
-
Compare strings, discarding end space
-
-
-
SYNOPSIS
-
my_strnncollsp_simple()
-
cs character set handler
-
a First string to compare
-
a_length Length of 'a'
-
b Second string to compare
-
b_length Length of 'b'
-
diff_if_only_endspace_difference
-
Set to 1 if the strings should be regarded as different
-
if they only difference in end space
-
-
-
IMPLEMENTATION
-
If one string is shorter as the other, then we space extend the other
-
so that the strings have equal length.
-
-
-
This will ensure that the following things hold:
-
-
-
"a" == "a "
-
"a/0" < "a"
-
"a/0" < "a "
-
-
-
RETURN
-
< 0 a < b
-
= 0 a == b
-
> 0 a > b
-
*/
-
-
-
int my_strnncollsp_simple(const CHARSET_INFO *cs, const uchar *a,
-
size_t a_length, const uchar *b, size_t b_length,
-
my_bool diff_if_only_endspace_difference)
-
{
-
const uchar *map= cs->sort_order, *end;
-
size_t length;
-
int res;
-
-
-
#ifndef VARCHAR_WITH_DIFF_ENDSPACE_ARE_DIFFERENT_FOR_UNIQUE
-
diff_if_only_endspace_difference= 0;
-
#endif
-
-
-
end= a + (length= MY_MIN(a_length, b_length));
-
while (a < end)
-
{
-
if (map[*a++] != map[*b++])
-
return ((int) map[a[-1]] - (int) map[b[-1]]);
-
}
-
res= 0;
-
if (a_length != b_length)
-
{
-
int swap= 1;
-
if (diff_if_only_endspace_difference)
-
res= 1; /* Assume 'a' is bigger */
-
/*
-
Check the next not space character of the longer key. If it's < ' ',
-
then it's smaller than the other key.
-
*/
-
if (a_length < b_length)
-
{
-
/* put shorter key in s */
-
a_length= b_length;
-
a= b;
-
swap= -1; /* swap sign of result */
-
res= -res;
-
}
-
for (end= a + a_length-length; a < end ; a++)
-
{
-
if (map[*a] != map[' '])
-
return (map[*a] < map[' ']) ? -swap : swap;
-
}
-
}
-
return res;
-
}
其中*map= cs->sort_order比较关键这是内存中已经存储好的字符集的顺序,
循环进行
map[*a++] != map[*b++]
*a++和*b++ 会得到的字符集编码,然后在整个排序好的字符数组中找,
则得到了实际字符集编码进行比较,不管是比较的复杂度还是需要比较的
长度 varchar很可能都远远大于int类型,下面是打印cs->sort_order这片
内存区域前128字节得到的结果,
(gdb) x/128bx 0x258b000
0x258b000 : 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
0x258b008 : 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
0x258b010 : 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
0x258b018 : 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
0x258b020 : 0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27
0x258b028 : 0x28 0x29 0x2a 0x2b 0x2c 0x2d 0x2e 0x2f
0x258b030 : 0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37
0x258b038 : 0x38 0x39 0x3a 0x3b 0x3c 0x3d 0x3e 0x3f
0x258b040 : 0x40 0x41 0x42 0x43 0x44 0x45 0x46 0x47
0x258b048 : 0x48 0x49 0x4a 0x4b 0x4c 0x4d 0x4e 0x4f
0x258b050 : 0x50 0x51 0x52 0x53 0x54 0x55 0x56 0x57
0x258b058 : 0x58 0x59 0x5a 0x5b 0x5c 0x5d 0x5e 0x5f
0x258b060 : 0x60 0x41 0x42 0x43 0x44 0x45 0x46 0x47
0x258b068 : 0x48 0x49 0x4a 0x4b 0x4c 0x4d 0x4e 0x4f
0x258b070 : 0x50 0x51 0x52 0x53 0x54 0x55 0x56 0x57
0x258b078 : 0x58 0x59 0x5a 0x7b 0x7c 0x7d 0x7e 0x7f
而从内存的地址0x258b000我们也能看到他确实是存在于堆内存空间中,它是一片堆内存区域。
下面是varchar比较的调用栈帧以备后用
-
#0 my_strnncollsp_simple (cs=0x2d4b9c0, a=0x7fff57a71f93 "gaopeng", a_length=7, b=0x7fffbd7e807f "gaopeng", b_length=7, diff_if_only_endspace_difference=0 '/000')
-
at /root/mysql5.7.14/percona-server-5.7.14-7/strings/ctype-simple.c:165
-
#1 0x0000000001ab8ec2 in cmp_whole_field (mtype=1, prtype=524303, a=0x7fff57a71f93 "gaopeng", a_length=7, b=0x7fffbd7e807f "gaopeng", b_length=7)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:374
-
#2 0x0000000001aba827 in cmp_data (mtype=1, prtype=524303, data1=0x7fff57a71f93 "gaopeng", len1=7, data2=0x7fffbd7e807f "gaopeng", len2=7)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:468
-
#3 0x0000000001ab9a05 in cmp_dtuple_rec_with_match_bytes (dtuple=0x7fff48ed3280, rec=0x7fffbd7e807f "gaopeng", index=0x7fff48ec78a0, offsets=0x7fff57a6bc50,
-
matched_fields=0x7fff57a6bf80, matched_bytes=0x7fff57a6bf78) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:880
-
#4 0x0000000001a87fe2 in page_cur_search_with_match_bytes (block=0x7fffbcceafc0, index=0x7fff48ec78a0, tuple=0x7fff48ed3280, mode=PAGE_CUR_GE,
-
iup_matched_fields=0x7fff57a6cdf8, iup_matched_bytes=0x7fff57a6cdf0, ilow_matched_fields=0x7fff57a6cde8, ilow_matched_bytes=0x7fff57a6cde0, cursor=0x7fff57a713f8)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/page/page0cur.cc:850
-
#5 0x0000000001c17a3e in btr_cur_search_to_nth_level (index=0x7fff48ec78a0, level=0, tuple=0x7fff48ed3280, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fff57a713f0,
-
has_search_latch=0, file=0x2336938 "/root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc", line=5744, mtr=0x7fff57a70ee0)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:1478
-
#6 0x0000000001c222bf in btr_estimate_n_rows_in_range_low (index=0x7fff48ec78a0, tuple1=0x7fff48ed3280, mode1=PAGE_CUR_GE, tuple2=0x7fff48ed32e0, mode2=PAGE_CUR_G,
-
nth_attempt=1) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:5744
-
#7 0x0000000001c22a09 in btr_estimate_n_rows_in_range (index=0x7fff48ec78a0, tuple1=0x7fff48ed3280, mode1=PAGE_CUR_GE, tuple2=0x7fff48ed32e0, mode2=PAGE_CUR_G)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:6044
-
#8 0x00000000019b3e0e in ha_innobase::records_in_range (this=0x7fff48e7e3b0, keynr=1, min_key=0x7fff57a71680, max_key=0x7fff57a716a0)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:13938
-
#9 0x0000000000f6ed5b in handler::multi_range_read_info_const (this=0x7fff48e7e3b0, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges_arg=0,
-
bufsz=0x7fff57a71780, flags=0x7fff57a71784, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:6440
-
#10 0x0000000000f70662 in DsMrr_impl::dsmrr_info_const (this=0x7fff48e7e820, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges=0,
-
bufsz=0x7fff57a71d70, flags=0x7fff57a71d74, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:7112
-
#11 0x00000000019be22f in ha_innobase::multi_range_read_info_const (this=0x7fff48e7e3b0, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges=0,
-
bufsz=0x7fff57a71d70, flags=0x7fff57a71d74, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:21351
-
#12 0x000000000178c9e4 in check_quick_select (param=0x7fff57a71e30, idx=0, index_only=false, tree=0x7fff48e700e0, update_tbl_stats=true, mrr_flags=0x7fff57a71d74,
-
bufsize=0x7fff57a71d70, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:10030
-
#13 0x0000000001783305 in get_key_scans_params (param=0x7fff57a71e30, tree=0x7fff48e70058, index_read_must_be_used=false, update_tbl_stats=true,
-
cost_est=0x7fff57a74190) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:5812
-
#14 0x000000000177ce43 in test_quick_select (thd=0x7fff4801f4d0, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false,
-
interesting_order=st_order::ORDER_NOT_RELEVANT, tab=0x7fff48eacf20, cond=0x7fff48eacd50, needed_reg=0x7fff48eacf60, quick=0x7fff57a744c8)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:3066
-
#15 0x000000000158b9bc in get_quick_record_count (thd=0x7fff4801f4d0, tab=0x7fff48eacf20, limit=18446744073709551615)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5942
-
#16 0x000000000158b073 in JOIN::estimate_rowcount (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5689
-
#17 0x00000000015893b5 in JOIN::make_join_plan (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5046
-
#18 0x000000000157d9b7 in JOIN::optimize (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:387
-
#19 0x00000000015fab71 in st_select_lex::optimize (this=0x7fff48aa45c0, thd=0x7fff4801f4d0) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:1009
-
#20 0x00000000015f9284 in handle_query (thd=0x7fff4801f4d0, lex=0x7fff48021ab0, result=0x7fff48aa5dc8, added_options=0, removed_options=0)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:164
-
#21 0x00000000015ac159 in execute_sqlcom_select (thd=0x7fff4801f4d0, all_tables=0x7fff48aa54b8) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5391
-
#22 0x00000000015a4774 in mysql_execute_command (thd=0x7fff4801f4d0, first_level=true) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:2889
-
#23 0x00000000015ad12a in mysql_parse (thd=0x7fff4801f4d0, parser_state=0x7fff57a76600) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5836
-
#24 0x00000000015a0fe9 in dispatch_command (thd=0x7fff4801f4d0, com_data=0x7fff57a76d70, command=COM_QUERY)
-
at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1447
-
#25 0x000000000159fe1a in do_command (thd=0x7fff4801f4d0) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1010
-
#26 0x00000000016e1d6c in handle_connection (arg=0x6320740) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/conn_handler/connection_handler_per_thread.cc:312
-
---Type <return> to continue, or q <return> to quit---
-
#27 0x0000000001d723f4 in pfs_spawn_thread (arg=0x6320530) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/perfschema/pfs.cc:2188
-
#28 0x0000003ca62079d1 in start_thread () from /lib64/libpthread.so.0
-
#29 0x0000003ca5ee8b6d in clone () from /lib64/libc.so.6
作者微信: