为什么我在MySQL中遇到死锁

我在MySQL表中遇到死锁 . 只涉及一个表,我可以一致地重现它 . 它只发生在我运行代码的多个线程时 .

这是表格:

CREATE TABLE `users_roles` (
  `role_id` bigint(20) NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`user_id`,`role_id`),
  KEY `created` (`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

然后,我在每个线程中运行这两个查询,每个线程对user_id具有不同的值 .

BEGIN;
DELETE FROM `users_roles` WHERE user_id = X;
INSERT INTO `users_roles` VALUES (7, X, NOW()); -- DEADLOCK ON THIS QUERY
COMMIT;

应该注意,当调用DELETE语句时,user_id X never 存在于数据库中 . 运行这些查询的代码位用于创建新用户 . 但是,该功能允许我修改用户所在的帐户,并因此从旧用户的团队中删除现有角色 .

因此,当足够的这些查询并行运行时,我开始遇到死锁 . InnoDB状态的死锁部分在每次死锁后显示此信息 .

------------------------
LATEST DETECTED DEADLOCK
------------------------
2014-05-09 16:02:20 7fbc99e5f700

*** (1) TRANSACTION:
TRANSACTION 6241424274, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 6
MySQL thread id 3772090, OS thread handle 0x7fbc1f451700, query id 4010665755 10.0.141.36 1403_users update

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1331 page no 10665 n bits 192 index `PRIMARY` of table `users_data`.`users_roles` trx id 6241424274 lock_mode X insert intention waiting

*** (2) TRANSACTION:
TRANSACTION 6241424275, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 6
MySQL thread id 3770297, OS thread handle 0x7fbc99e5f700, query id 4010665767 10.0.137.28 1403_users update
INSERT INTO users_roles(role_id, user_id, created) values(5, 102228093, NOW()) ON DUPLICATE KEY UPDATE user_id=user_id

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1331 page no 10665 n bits 192 index `PRIMARY` of table `users_data`.`users_roles` trx id 6241424275 lock_mode X

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1331 page no 10665 n bits 192 index `PRIMARY` of table `users_data`.`users_roles` trx id 6241424275 lock_mode X insert intention waiting

*** WE ROLL BACK TRANSACTION (2)

至于调试或尝试找到问题的实际位置,我已经能够通过从代码中删除DELETE语句来消除所有死锁 . 虽然这确实解决了这个问题,但我想了解它 .

我所理解的是MySQL处理间隙锁的方式 . 我知道他们在这个问题上正在工作,因为行不知道为什么innodb状态中的两个事务都是从相同的代码生成的,但只有其中一个,事务(2)具有独占锁 . 它甚至试图获得独占锁(没有插入意图) .

假设锁是正确的,我可以理解为什么会发生死锁:事务(2)获得独占锁,事务(1)请求插入意图,然后事务(2)请求插入意图 . 这是有道理的 . 没有意义的是在事务(1)中没有排他锁(没有插入意图) .


编辑:

我能够通过特定的命令顺序重现这一点 .

这是表格:

CREATE TABLE `a` (
  `id` tinyint(3) unsigned NOT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

这是查询 . 打开4个终端到mysql并按此顺序执行查询 .

session 1: BEGIN;
session 2: BEGIN;
session 3: BEGIN;
session 4: BEGIN;
session 1: DELETE FROM `a` WHERE `id` = 5;
session 2: DELETE FROM `a` WHERE `id` = 10;
session 3: DELETE FROM `a` WHERE `id` = 7;
session 4: DELETE FROM `a` WHERE `id` = 12;
session 1: INSERT INTO `a` VALUES (5, 1);
session 2: INSERT INTO `a` VALUES (10, 1); -- deadlock here
session 3: INSERT INTO `a` VALUES (7, 1); -- deadlock here
session 4: INSERT INTO `a` VALUES (12, 1); -- deadlock here

这是在任何插入之前的InnoDB状态 .

------------
TRANSACTIONS
------------
Trx id counter 11396965
Purge done for trx's n:o < 11396913 undo n:o < 0 state: running but idle
History list length 1248
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 11396962, ACTIVE 9 sec
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3425, OS thread handle 0x7fcd14197700, query id 29686 localhost dev cleaning up
TABLE LOCK table `matthew`.`a` trx id 11396962 lock mode IX
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396962 lock_mode X
---TRANSACTION 11396961, ACTIVE 10 sec
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3426, OS thread handle 0x7fccda225700, query id 29673 localhost dev cleaning up
TABLE LOCK table `matthew`.`a` trx id 11396961 lock mode IX
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396961 lock_mode X
---TRANSACTION 11396960, ACTIVE 11 sec
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3391, OS thread handle 0x7fccd4d7f700, query id 29672 localhost dev cleaning up
TABLE LOCK table `matthew`.`a` trx id 11396960 lock mode IX
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396960 lock_mode X
---TRANSACTION 11396959, ACTIVE 13 sec
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3392, OS thread handle 0x7fccd4bf9700, query id 29671 localhost dev cleaning up
TABLE LOCK table `matthew`.`a` trx id 11396959 lock mode IX
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396959 lock_mode X

在调用第一个插入后,会话1的终端挂起锁 . InnoDB状态显示:

---TRANSACTION 11396959, ACTIVE 841 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s)
MySQL thread id 3392, OS thread handle 0x7fccd4bf9700, query id 30234 localhost dev update
------- TRX HAS BEEN WAITING 32 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396959 lock_mode X insert intention waiting
------------------
TABLE LOCK table `matthew`.`a` trx id 11396959 lock mode IX
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396959 lock_mode X
RECORD LOCKS space id 7291 page no 3 n bits 80 index `PRIMARY` of table `matthew`.`a` trx id 11396959 lock_mode X insert intention waiting

回答(1)

2 years ago

我相信我找到了这个问题 .

http://dev.mysql.com/doc/refman/5.6/en/innodb-locks-set.html

在插入行之前,设置一种称为插入意图间隙锁定的间隙锁定 . 该锁定表示以这样的方式插入的意图:如果插入到相同索引间隙中的多个事务不插入间隙内的相同位置,则不需要等待彼此 . 假设存在值为4和7的索引记录 . 尝试插入值5和6的单独事务在获取插入行上的排它锁之前使用插入意图锁定锁定4和7之间的间隙,但不因为行是非冲突的,所以互相阻塞 . 如果发生重复键错误,则设置重复索引记录上的共享锁 . 如果有多个会话尝试插入同一行,如果另一个会话已经具有独占锁,则使用共享锁可能导致死锁 . 如果另一个会话删除该行,则会发生这种情况 .

这就是我的意思 .

删除,因为它们不影响任何行,所有删除都在表末端间隙上获得共享锁(模式IX) . 执行插入后,共享锁仍由所有线程保持,插入意图等待释放此共享锁 .

解决方案是 not 并行执行以下操作:

  • 当行不存在时,删除要插入的行 .

  • 插入行

所以,InnoDB引擎状态是错误的 . 它未能显示每个事务都持有相同的锁 . 它未能显示每个锁是lock_mode IX,而不是X.它未能显示每个线程还有一个等待授予的插入意图锁 . 总而言之,这是 SHOW ENGINE INNODB STATUS; 的一次非常壮观的失败 .