linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Holger Hoffstätte" <holger@applied-asynchrony.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	Nikolay Borisov <nborisov@suse.com>, Jens Axboe <axboe@kernel.dk>,
	Jan Kara <jack@suse.cz>, Paolo Valente <paolo.valente@linaro.org>,
	Linux-RAID <linux-raid@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: stalling IO regression since linux 5.12, through 5.18
Date: Wed, 17 Aug 2022 13:49:33 +0200	[thread overview]
Message-ID: <20220817114933.66c4g4xjsi4df2tg@quack3> (raw)
In-Reply-To: <7c830487-95a6-b008-920b-8bc4a318f10a@applied-asynchrony.com>

On Wed 17-08-22 11:52:54, Holger Hoffstätte wrote:
> On 2022-08-16 17:34, Chris Murphy wrote:
> > 
> > On Tue, Aug 16, 2022, at 11:25 AM, Nikolay Borisov wrote:
> > > How about changing the scheduler either mq-deadline or noop, just
> > > to see if this is also reproducible with a different scheduler. I
> > > guess noop would imply the blk cgroup controller is going to be
> > > disabled
> > 
> > I already reported on that: always happens with bfq within an hour or
> > less. Doesn't happen with mq-deadline for ~25+ hours. Does happen
> > with bfq with the above patches removed. Does happen with
> > cgroup.disabled=io set.
> > 
> > Sounds to me like it's something bfq depends on and is somehow
> > becoming perturbed in a way that mq-deadline does not, and has
> > changed between 5.11 and 5.12. I have no idea what's under bfq that
> > matches this description.
> 
> Chris, just a shot in the dark but can you try the patch from
> 
> https://lore.kernel.org/linux-block/20220803121504.212071-1-yukuai1@huaweicloud.com/
> 
> on top of something more recent than 5.12? Ideally 5.19 where it applies
> cleanly.
> 
> No guarantees, I just remembered this patch and your problem sounds like
> a lost wakeup. Maybe BFQ just drives the sbitmap in a way that triggers the
> symptom.

Yes, symptoms look similar and it happens for devices with shared tagsets
(which megaraid sas is) but that problem usually appeared when there are
lots of LUNs sharing the tagset so that number of tags available per LUN
was rather low. Not sure if that is the case here but probably that patch
is worth a try.

Another thing worth trying is to compile the kernel without
CONFIG_BFQ_GROUP_IOSCHED. That will essentially disable cgroup support in
BFQ so we will see whether the problem may be cgroup related or not.

Another interesting thing might be to dump
/sys/kernel/debug/block/<device>/hctx*/{sched_tags,sched_tags_bitmap,tags,tags_bitmap}
as the system is hanging. That should tell us whether tags are in fact in
use or not when processes are blocking waiting for tags.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2022-08-17 11:49 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-10 16:35 stalling IO regression in linux 5.12 Chris Murphy
2022-08-10 17:48 ` Josef Bacik
2022-08-10 18:33   ` Chris Murphy
2022-08-10 18:42     ` Chris Murphy
2022-08-10 19:31       ` Josef Bacik
2022-08-10 19:34       ` Chris Murphy
2022-08-12 16:05         ` stalling IO regression since linux 5.12, through 5.18 Chris Murphy
2022-08-12 17:59           ` Josef Bacik
2022-08-12 18:02             ` Jens Axboe
2022-08-14 20:28               ` Chris Murphy
2022-08-16 14:22                 ` Chris Murphy
2022-08-16 15:25                   ` Nikolay Borisov
2022-08-16 15:34                     ` Chris Murphy
2022-08-17  9:52                       ` Holger Hoffstätte
2022-08-17 11:49                         ` Jan Kara [this message]
2022-08-17 14:37                           ` Chris Murphy
2022-08-17 15:09                           ` Chris Murphy
2022-08-17 16:30                             ` Jan Kara
2022-08-17 16:47                               ` Chris Murphy
2022-08-17 17:57                                 ` Chris Murphy
2022-08-17 18:15                                   ` Jan Kara
2022-08-17 18:18                                     ` Chris Murphy
2022-08-17 18:33                                       ` Jan Kara
2022-08-17 18:54                                         ` Chris Murphy
2022-08-17 19:23                                           ` Chris Murphy
2022-08-18  2:31                                           ` Chris Murphy
2022-08-17 18:21                                     ` Holger Hoffstätte
2022-08-17 11:57                         ` Chris Murphy
2022-08-17 12:31                           ` Holger Hoffstätte
2022-08-17 18:16                         ` Chris Murphy
2022-08-17 18:38                           ` Holger Hoffstätte
2022-08-17 12:06                       ` Ming Lei
2022-08-17 14:34                         ` Chris Murphy
2022-08-17 14:53                           ` Ming Lei
2022-08-17 15:02                             ` Chris Murphy
2022-08-17 15:34                               ` Ming Lei
2022-08-17 16:34                                 ` Chris Murphy
2022-08-18  1:03                                   ` Ming Lei
2022-08-18  2:30                                     ` Chris Murphy
2022-08-18  3:24                                       ` Ming Lei
2022-08-18  4:12                                         ` Chris Murphy
2022-08-18  4:18                                           ` Chris Murphy
2022-08-18  4:27                                             ` Chris Murphy
2022-08-18  4:32                                               ` Chris Murphy
2022-08-18  5:15                                               ` Ming Lei
2022-08-18 18:52                                                 ` Chris Murphy
2022-08-18  5:24                                               ` Ming Lei
2022-08-18 13:50                                                 ` Chris Murphy
2022-08-18 15:10                                                   ` Ming Lei
2022-08-19 19:20                                                 ` Chris Murphy
2022-08-20  7:00                                                   ` Ming Lei
2022-09-01  7:02                                                     ` Yu Kuai
2022-09-01  8:03                                                       ` Jan Kara
2022-09-01  8:19                                                         ` Yu Kuai
2022-09-06  9:49                                                           ` Paolo Valente
2022-09-02 16:53                                                       ` Chris Murphy
2022-09-06  9:45                                                       ` Paolo Valente
2022-08-15 11:25 ` stalling IO regression in linux 5.12 Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220817114933.66c4g4xjsi4df2tg@quack3 \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=holger@applied-asynchrony.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=nborisov@suse.com \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).