linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression caused by block: freeze the queue earlier in del_gendisk
@ 2022-08-26 16:15 Dusty Mabe
  2022-08-28 10:24 ` Thorsten Leemhuis
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Dusty Mabe @ 2022-08-26 16:15 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel; +Cc: hch

Hey All,

I think I've found a regression introduced by:

a09b314 o block: freeze the queue earlier in del_gendisk

In Fedora CoreOS we have tests that set up RAID1 on the /boot/ and /root/ partitions
and then subsequently removes one of the disks to simulate a failure. Sometime recently
this test started timing out occasionally. Looking a bit closer it appears instances are
getting stuck during reboot with a bunch of looping messages:

```
[   17.978854] block device autoloading is deprecated and will be removed.
[   17.982555] block device autoloading is deprecated and will be removed.
[   17.985537] block device autoloading is deprecated and will be removed.
[   17.987546] block device autoloading is deprecated and will be removed.
[   17.989540] block device autoloading is deprecated and will be removed.
[   17.991547] block device autoloading is deprecated and will be removed.
[   17.993555] block device autoloading is deprecated and will be removed.
[   17.995539] block device autoloading is deprecated and will be removed.
[   17.997577] block device autoloading is deprecated and will be removed.
[   17.999544] block device autoloading is deprecated and will be removed.
[   22.979465] blkdev_get_no_open: 1666 callbacks suppressed
...
...
...
[  618.221270] blkdev_get_no_open: 1664 callbacks suppressed
[  618.221273] block device autoloading is deprecated and will be removed.
[  618.224274] block device autoloading is deprecated and will be removed.
[  618.227267] block device autoloading is deprecated and will be removed.
[  618.229274] block device autoloading is deprecated and will be removed.
[  618.231277] block device autoloading is deprecated and will be removed.
[  618.233277] block device autoloading is deprecated and will be removed.
[  618.235282] block device autoloading is deprecated and will be removed.
[  618.237370] block device autoloading is deprecated and will be removed.
[  618.239356] block device autoloading is deprecated and will be removed.
[  618.241290] block device autoloading is deprecated and will be removed.
```

Using the Fedora kernels I narrowed it down to being introduced between 
`kernel-5.19.0-0.rc3.27.fc37` (good) and `kernel-5.19.0-0.rc4.33.fc37` (bad).

I then did a bisect and found:

```
$ git bisect bad
a09b314005f3a0956ebf56e01b3b80339df577cc is the first bad commit
commit a09b314005f3a0956ebf56e01b3b80339df577cc
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jun 14 09:48:27 2022 +0200

    block: freeze the queue earlier in del_gendisk
    
    Freeze the queue earlier in del_gendisk so that the state does not
    change while we remove debugfs and sysfs files.
    
    Ming mentioned that being able to observer request in debugfs might
    be useful while the queue is being frozen in del_gendisk, which is
    made possible by this change.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220614074827.458955-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

 block/genhd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
```

Reverting this commit on top of latest git master (4c612826b) gave me successful results.

Any ideas on what could be amiss here? Luckily the patch is tiny so hopefully it might
be obvious.

More details (including logs) in the following locations:

- https://bugzilla.redhat.com/show_bug.cgi?id=2121791
- https://github.com/coreos/fedora-coreos-tracker/issues/1282


Thanks!
Dusty


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-09-26  7:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 16:15 regression caused by block: freeze the queue earlier in del_gendisk Dusty Mabe
2022-08-28 10:24 ` Thorsten Leemhuis
2022-08-31 12:36   ` Thorsten Leemhuis
2022-09-01  7:06 ` Ming Lei
2022-09-03 13:47   ` Dusty Mabe
2022-09-07  7:33   ` Christoph Hellwig
2022-09-07  8:38     ` Ming Lei
2022-09-07 14:40       ` Chaitanya Kulkarni
2022-09-07 14:58         ` Ming Lei
2022-09-07 15:53           ` Chaitanya Kulkarni
2022-09-09  8:24     ` Ming Lei
2022-09-12  7:16       ` Christoph Hellwig
2022-09-13  1:55         ` Ming Lei
2022-09-13  2:36           ` Dusty Mabe
2022-09-20  9:11             ` Thorsten Leemhuis
2022-09-20 14:05               ` Jens Axboe
2022-09-20 14:12                 ` Christoph Hellwig
2022-09-20 14:14                   ` Jens Axboe
2022-09-21  9:25                     ` Thorsten Leemhuis
2022-09-21 14:34                       ` Jens Axboe
2022-09-21 14:47                         ` Greg KH
2022-09-21 14:56                           ` Jens Axboe
2022-09-26  7:09                             ` Greg KH
2022-09-07  7:22 ` Christoph Hellwig
2022-09-07 14:56   ` Dusty Mabe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).