Regression caused by loopback related patches in 4.14.95

* Regression caused by loopback related patches in 4.14.95
@ 2019-02-26 23:35 Thomas Lindroth
  2019-02-27 10:30 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lindroth @ 2019-02-26 23:35 UTC (permalink / raw)
  To: jack; +Cc: linux-block, stable

When I run "losetup --verbose --partscan --read-only --find /mnt/gemini.61rn.3T/Backups/debian.raw"
on a 4.14.103 system losetup hangs for exactly 3 minutes. After the hang the loopback device works
like it should. The /mnt/gemini.61rn.3T mount is an ext4 fs on dm-crypt on a spinning sata disk.

The hang was introduced in 4.14.95 and there are several loop related patches in 4.14.95.
I bisected it down to commit c1e63df4f30c3918476ac9bc594355b0e9629893
"loop: Get rid of loop_index_mutex". Reverting that commit from 4.14.103 also fixes the problem.

This could be a problem in just the 4.14 stable series. I haven't tested any other series.

Here is the output of dmesg when losetup hangs:
[Feb26 21:59] systemd-udevd[996]: seq 2791 '/devices/virtual/block/loop0' is taking a long time
[Feb26 22:01] INFO: task losetup:7694 blocked for more than 120 seconds.
[  +0.000009]       Not tainted 4.14.103 #25
[  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000001] losetup         D
[  +0.000002]     0  7694   7687 0x00080000
[  +0.000002] Call Trace:
[  +0.000005]  ? __schedule+0x273/0x870
[  +0.000003]  schedule+0x2f/0x90
[  +0.000002]  schedule_preempt_disabled+0x11/0x20
[  +0.000002]  __mutex_lock.isra.2+0x32c/0x540
[  +0.000003]  ? __wake_up_common_lock+0x8a/0xc0
[  +0.000004]  blkdev_reread_part+0x16/0x30
[  +0.000100]  loop_reread_partitions+0x27/0x30
[  +0.000004]  loop_set_status+0x335/0x410
[  +0.000003]  loop_set_status64+0x4b/0x80
[  +0.000003]  lo_ioctl+0x1e7/0x7d0
[  +0.000003]  blkdev_ioctl+0x446/0x9d0
[  +0.000004]  block_ioctl+0x39/0x40
[  +0.000004]  do_vfs_ioctl+0xa4/0x650
[  +0.000002]  SyS_ioctl+0x74/0x80
[  +0.000004]  do_syscall_64+0x6e/0x170
[  +0.000003]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  +0.000003] RIP: 0033:0x7f455c7912a7
[  +0.000002] RSP: 002b:00007ffe5a9d6908 EFLAGS: 00000246
[  +0.000001]  ORIG_RAX: 0000000000000010
[  +0.000003] RAX: ffffffffffffffda RBX: 00007ffe5a9d6ab0 RCX: 00007f455c7912a7
[  +0.000002] RDX: 00007ffe5a9d6b50 RSI: 0000000000004c04 RDI: 0000000000000004
[  +0.000002] RBP: 0000000000000004 R08: 0000000000000008 R09: 696265642f737075
[  +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f455ce816b8
[  +0.000002] R13: 0000000000000003 R14: 00007ffe5a9d6b50 R15: 00007ffe5a9d6930
[  +6.555728] systemd-udevd[996]: seq 2791 '/devices/virtual/block/loop0' killed
[  +0.000247] systemd-udevd[996]: worker [7697] terminated by signal 9 (KILL)
[  +0.000005] systemd-udevd[996]: worker [7697] failed while handling '/devices/virtual/block/loop0'
[  +0.052335]  loop0: p1 p2 < p5 >

 From that backtrace it looks like the problem is related to --partscan. Among the
loop related patches in 4.14.95 commit 57da9a9742200f391d1cf93fea389f7ddc25ec9a says:

   Note that syzbot is also reporting circular locking dependency between
   bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
   blkdev_reread_part() with lock held. This patch does not address it.

   [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

To me it looks like that is what is causing the hang. The syzkaller report says that the fix is
"loop: Fix deadlock when calling blkdev_reread_part()" but I don't see that commit in the
4.14 series.

This is an x86_64 Gentoo system. Here is the .config I use http://sprunge.us/u7YNBt

^ permalink raw reply	[flat|nested] 3+ messages in thread