On Thu, Sep 14 2017, Xiao Ni wrote: >> >> What do >> cat /proc/8987/stack >> cat /proc/8983/stack >> cat /proc/8966/stack >> cat /proc/8381/stack >> >> show?? > ... > > /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add lockdep_assert_held(&mddev->reconfig_mutex)? > [root@dell-pr1700-02 ~]# cat /proc/8983/stack > [] mddev_suspend+0x12c/0x160 [md_mod] > [] suspend_lo_store+0x7c/0xe0 [md_mod] > [] md_attr_store+0x80/0xc0 [md_mod] > [] sysfs_kf_write+0x3a/0x50 > [] kernfs_fop_write+0xff/0x180 > [] __vfs_write+0x37/0x170 > [] vfs_write+0xb2/0x1b0 > [] SyS_write+0x55/0xc0 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0xffffffffffffffff > > [jbd2/md0-8] > [root@dell-pr1700-02 ~]# cat /proc/8966/stack > [] md_write_start+0xf0/0x220 [md_mod] > [] raid5_make_request+0x89/0x8b0 [raid456] > [] md_make_request+0xf5/0x260 [md_mod] > [] generic_make_request+0x117/0x2f0 > [] submit_bio+0x75/0x150 > [] submit_bh_wbc+0x140/0x170 > [] submit_bh+0x13/0x20 > [] jbd2_write_superblock+0x109/0x230 [jbd2] > [] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2] > [] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2] > [] kjournald2+0xd2/0x260 [jbd2] > [] kthread+0x109/0x140 > [] ret_from_fork+0x25/0x30 > [] 0xffffffffffffffff Thanks for this (and sorry it took so long to get to it). It looks like Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()") is badly broken. I wonder how it ever passed testing. In write_start() is change the wait_event() call to wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended); That should be wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended); i.e. it was (!A && !B), it should be (!A || B) !!!!! Could you please make that change and try again. Thanks, NeilBrown