From: Logan Gunthorpe <logang@deltatee.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
linux-raid <linux-raid@vger.kernel.org>,
Song Liu <song@kernel.org>
Subject: Deadlock Issue with blk-wbt and raid5+journal
Date: Thu, 25 Aug 2022 16:19:35 -0600 [thread overview]
Message-ID: <7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com> (raw)
Hi Jens,
While testing md/raid5 with the journal option using loop devices, I've
found an easily reproducible hang on my system. Simply running an fio
write job with the md threadcnt set to 4 can hit it. However, curiously,
it is not hit without the journal being used.
I'm running on the current md/md-next branch; however I've seen this bug
for a couple months now on recent kernels and have no idea how long it's
been in the kernel for.
I end up seeing multiple hung tasks with the following stack trace:
schedule+0x9e/0x140
io_schedule+0x70/0xb0
rq_qos_wait+0x153/0x210
wbt_wait+0x127/0x1f0
__rq_qos_throttle+0x38/0x60
blk_mq_submit_bio+0x589/0xcd0
__submit_bio+0xe6/0x100
submit_bio_noacct_nocheck+0x42e/0x470
submit_bio_noacct+0x4c2/0xbb0
ops_run_io+0x46b/0x1a30
handle_stripe+0xcd3/0x36c0
handle_active_stripes.constprop.0+0x6f6/0xa60
raid5_do_work+0x177/0x330
process_one_work+0x609/0xb00
worker_thread+0x2d4/0x710
kthread+0x18c/0x1c0
ret_from_fork+0x1f/0x30
When this happens, I find 1 to 10ish inflight IO on the WBT of the
underlying loop devices as seen in
'/sys/kernel/debug/block/loop[0-3]/rqos/wbt/inflight'.
I've done some debugging in this area and this is what I'm seeing:
There are a few IO in the WBT that start scheduling when the inflight
counter reaches the limit (96 in my case). Then, a number of IO tasks
are scheduled after the limit gets exceeded. So far that makes sense. I
put some tracing in wbt_rqw_done() and can see that the inflight counts
back down to a low number as other IO are completed, but then it hangs
before reaching zero. However, wbt_rqw_done() never wakes up any other
threads because, for some reason, wb_recent_wait(rwb) always returns
false and thus the limit is always zero, and the conditional:
if (inflight && inflight >= limit)
return;
always gets hit because inflight is always greater than the zero limit
(as some inflight IO are sleeping waiting to be worken up). Thus the
sleeping tasks remain sleeping forever. I've also verified that
rwb_wake_all() never gets called in this scenario as well.
Given the conditions of hitting the bug, I fully expected this to be an
issue in the raid code, but unless I'm missing something, it sure looks
to me like a deadlock issue in the wbt code, which makes me wonder why
nobody else has hit it. Is there something else I'm missing that are
supposed to be waking up these processes? Or something weird about the
raid5+journal+loop code that is causing wb_recent_wait() to always be false?
Any thoughts?
Thanks,
Logan
next reply other threads:[~2022-08-25 22:20 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-25 22:19 Logan Gunthorpe [this message]
2022-09-02 23:46 ` Deadlock Issue with blk-wbt and raid5+journal Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com \
--to=logang@deltatee.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.