linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression caused by loopback related patches in 4.14.95
@ 2019-02-26 23:35 Thomas Lindroth
  2019-02-27 10:30 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lindroth @ 2019-02-26 23:35 UTC (permalink / raw)
  To: jack; +Cc: linux-block, stable

When I run "losetup --verbose --partscan --read-only --find /mnt/gemini.61rn.3T/Backups/debian.raw"
on a 4.14.103 system losetup hangs for exactly 3 minutes. After the hang the loopback device works
like it should. The /mnt/gemini.61rn.3T mount is an ext4 fs on dm-crypt on a spinning sata disk.

The hang was introduced in 4.14.95 and there are several loop related patches in 4.14.95.
I bisected it down to commit c1e63df4f30c3918476ac9bc594355b0e9629893
"loop: Get rid of loop_index_mutex". Reverting that commit from 4.14.103 also fixes the problem.

This could be a problem in just the 4.14 stable series. I haven't tested any other series.

Here is the output of dmesg when losetup hangs:
[Feb26 21:59] systemd-udevd[996]: seq 2791 '/devices/virtual/block/loop0' is taking a long time
[Feb26 22:01] INFO: task losetup:7694 blocked for more than 120 seconds.
[  +0.000009]       Not tainted 4.14.103 #25
[  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000001] losetup         D
[  +0.000002]     0  7694   7687 0x00080000
[  +0.000002] Call Trace:
[  +0.000005]  ? __schedule+0x273/0x870
[  +0.000003]  schedule+0x2f/0x90
[  +0.000002]  schedule_preempt_disabled+0x11/0x20
[  +0.000002]  __mutex_lock.isra.2+0x32c/0x540
[  +0.000003]  ? __wake_up_common_lock+0x8a/0xc0
[  +0.000004]  blkdev_reread_part+0x16/0x30
[  +0.000100]  loop_reread_partitions+0x27/0x30
[  +0.000004]  loop_set_status+0x335/0x410
[  +0.000003]  loop_set_status64+0x4b/0x80
[  +0.000003]  lo_ioctl+0x1e7/0x7d0
[  +0.000003]  blkdev_ioctl+0x446/0x9d0
[  +0.000004]  block_ioctl+0x39/0x40
[  +0.000004]  do_vfs_ioctl+0xa4/0x650
[  +0.000002]  SyS_ioctl+0x74/0x80
[  +0.000004]  do_syscall_64+0x6e/0x170
[  +0.000003]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  +0.000003] RIP: 0033:0x7f455c7912a7
[  +0.000002] RSP: 002b:00007ffe5a9d6908 EFLAGS: 00000246
[  +0.000001]  ORIG_RAX: 0000000000000010
[  +0.000003] RAX: ffffffffffffffda RBX: 00007ffe5a9d6ab0 RCX: 00007f455c7912a7
[  +0.000002] RDX: 00007ffe5a9d6b50 RSI: 0000000000004c04 RDI: 0000000000000004
[  +0.000002] RBP: 0000000000000004 R08: 0000000000000008 R09: 696265642f737075
[  +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f455ce816b8
[  +0.000002] R13: 0000000000000003 R14: 00007ffe5a9d6b50 R15: 00007ffe5a9d6930
[  +6.555728] systemd-udevd[996]: seq 2791 '/devices/virtual/block/loop0' killed
[  +0.000247] systemd-udevd[996]: worker [7697] terminated by signal 9 (KILL)
[  +0.000005] systemd-udevd[996]: worker [7697] failed while handling '/devices/virtual/block/loop0'
[  +0.052335]  loop0: p1 p2 < p5 >

 From that backtrace it looks like the problem is related to --partscan. Among the
loop related patches in 4.14.95 commit 57da9a9742200f391d1cf93fea389f7ddc25ec9a says:

   Note that syzbot is also reporting circular locking dependency between
   bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
   blkdev_reread_part() with lock held. This patch does not address it.

   [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

To me it looks like that is what is causing the hang. The syzkaller report says that the fix is
"loop: Fix deadlock when calling blkdev_reread_part()" but I don't see that commit in the
4.14 series.

This is an x86_64 Gentoo system. Here is the .config I use http://sprunge.us/u7YNBt

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Regression caused by loopback related patches in 4.14.95
  2019-02-26 23:35 Regression caused by loopback related patches in 4.14.95 Thomas Lindroth
@ 2019-02-27 10:30 ` Jan Kara
  2019-02-27 14:11   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2019-02-27 10:30 UTC (permalink / raw)
  To: Thomas Lindroth; +Cc: jack, linux-block, stable, Greg Kroah-Hartman

Hello!

Thanks for the detailed report and bisection!

On Wed 27-02-19 00:35:26, Thomas Lindroth wrote:
> When I run "losetup --verbose --partscan --read-only --find
> /mnt/gemini.61rn.3T/Backups/debian.raw" on a 4.14.103 system losetup
> hangs for exactly 3 minutes. After the hang the loopback device works
> like it should. The /mnt/gemini.61rn.3T mount is an ext4 fs on dm-crypt
> on a spinning sata disk.
> 
> The hang was introduced in 4.14.95 and there are several loop related
> patches in 4.14.95.  I bisected it down to commit
> c1e63df4f30c3918476ac9bc594355b0e9629893 "loop: Get rid of
> loop_index_mutex". Reverting that commit from 4.14.103 also fixes the
> problem.

So as you mention below, all the problems with loop device deadlocks didn't
get fixed in stable kernels as some changes were too intrusive for the
stable tree. Now unfortunately the commit 0a42e99b58a "loop: Get rid of
loop_index_mutex" that did get backported makes some deadlocks much easier
to hit as I'm looking into that now. For example when partitions are reread in
loop_set_status(), it takes just one process trying to open the loop device
to deadlock the kernel.

Actually that commit got already reverted in 4.4 stable because I've
pointed out to Greg earlier that it has a doubtful benefit without followup
fixes. But sadly it remained in other stable branches. Now going through
the active branches the summary seems to be:

3.18 and older: never applied
4.4:  already reverted
4.9:  needs revert
4.14: needs revert
4.19 and newer: followup fixes applied

So Greg, can you please revert the same three commits that you've reverted
in 4.4 also in 4.9 and 4.14 stable threes? These are:

0a42e99b58a "loop: Get rid of loop_index_mutex"
967d1dc144b "loop: Fold __loop_release into loop_release"
628bd859470 "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"

Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Regression caused by loopback related patches in 4.14.95
  2019-02-27 10:30 ` Jan Kara
@ 2019-02-27 14:11   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 3+ messages in thread
From: Greg Kroah-Hartman @ 2019-02-27 14:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: Thomas Lindroth, linux-block, stable

On Wed, Feb 27, 2019 at 11:30:22AM +0100, Jan Kara wrote:
> Hello!
> 
> Thanks for the detailed report and bisection!
> 
> On Wed 27-02-19 00:35:26, Thomas Lindroth wrote:
> > When I run "losetup --verbose --partscan --read-only --find
> > /mnt/gemini.61rn.3T/Backups/debian.raw" on a 4.14.103 system losetup
> > hangs for exactly 3 minutes. After the hang the loopback device works
> > like it should. The /mnt/gemini.61rn.3T mount is an ext4 fs on dm-crypt
> > on a spinning sata disk.
> > 
> > The hang was introduced in 4.14.95 and there are several loop related
> > patches in 4.14.95.  I bisected it down to commit
> > c1e63df4f30c3918476ac9bc594355b0e9629893 "loop: Get rid of
> > loop_index_mutex". Reverting that commit from 4.14.103 also fixes the
> > problem.
> 
> So as you mention below, all the problems with loop device deadlocks didn't
> get fixed in stable kernels as some changes were too intrusive for the
> stable tree. Now unfortunately the commit 0a42e99b58a "loop: Get rid of
> loop_index_mutex" that did get backported makes some deadlocks much easier
> to hit as I'm looking into that now. For example when partitions are reread in
> loop_set_status(), it takes just one process trying to open the loop device
> to deadlock the kernel.
> 
> Actually that commit got already reverted in 4.4 stable because I've
> pointed out to Greg earlier that it has a doubtful benefit without followup
> fixes. But sadly it remained in other stable branches. Now going through
> the active branches the summary seems to be:
> 
> 3.18 and older: never applied
> 4.4:  already reverted
> 4.9:  needs revert
> 4.14: needs revert
> 4.19 and newer: followup fixes applied
> 
> So Greg, can you please revert the same three commits that you've reverted
> in 4.4 also in 4.9 and 4.14 stable threes? These are:
> 
> 0a42e99b58a "loop: Get rid of loop_index_mutex"
> 967d1dc144b "loop: Fold __loop_release into loop_release"
> 628bd859470 "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"

Now all reverted, sorry about this.

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-27 14:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 23:35 Regression caused by loopback related patches in 4.14.95 Thomas Lindroth
2019-02-27 10:30 ` Jan Kara
2019-02-27 14:11   ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).