From: Hou Tao <houtao1@huawei.com>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
"Alexander Viro" <viro@zeniv.linux.org.uk>
Cc: <linux-block@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<houtao1@huawei.com>, <yukuai3@huawei.com>
Subject: [PATCH] block: ensure the memory order between bi_private and bi_status
Date: Thu, 1 Jul 2021 19:35:37 +0800 [thread overview]
Message-ID: <20210701113537.582120-1-houtao1@huawei.com> (raw)
When running stress test on null_blk under linux-4.19.y, the following
warning is reported:
percpu_ref_switch_to_atomic_rcu: percpu ref (css_release) <= 0 (-3) after switching to atomic
The cause is that css_put() is invoked twice on the same bio as shown below:
CPU 1: CPU 2:
// IO completion kworker // IO submit thread
__blkdev_direct_IO_simple
submit_bio
bio_endio
bio_uninit(bio)
css_put(bi_css)
bi_css = NULL
set_current_state(TASK_UNINTERRUPTIBLE)
bio->bi_end_io
blkdev_bio_end_io_simple
bio->bi_private = NULL
// bi_private is NULL
READ_ONCE(bio->bi_private)
wake_up_process
smp_mb__after_spinlock
bio_unint(bio)
// read bi_css as no-NULL
// so call css_put() again
css_put(bi_css)
Because there is no memory barriers between the reading and the writing of
bi_private and bi_css, so reading bi_private as NULL can not guarantee
bi_css will also be NULL on weak-memory model host (e.g, ARM64).
For the latest kernel source, css_put() has been removed from bio_unint(),
but the memory-order problem still exists, because the order between
bio->bi_private and {bi_status|bi_blkg} is also assumed in
__blkdev_direct_IO_simple(). It is reproducible that
__blkdev_direct_IO_simple() may read bi_status as 0 event if
bi_status is set as an errno in req_bio_endio().
In __blkdev_direct_IO(), the memory order between dio->waiter and
dio->bio.bi_status is not guaranteed neither. Until now it is unable to
reproduce it, maybe because dio->waiter and dio->bio.bi_status are
in the same cache-line. But it is better to add guarantee for memory
order.
Fixing it by using smp_load_acquire() & smp_store_release() to guarantee
the order between {bio->bi_private|dio->waiter} and {bi_status|bi_blkg}.
Fixes: 189ce2b9dcc3 ("block: fast-path for small and simple direct I/O requests")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
fs/block_dev.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index eb34f5c357cf..a602c6315b0b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -224,7 +224,11 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
{
struct task_struct *waiter = bio->bi_private;
- WRITE_ONCE(bio->bi_private, NULL);
+ /*
+ * Paired with smp_load_acquire in __blkdev_direct_IO_simple()
+ * to ensure the order between bi_private and bi_xxx
+ */
+ smp_store_release(&bio->bi_private, NULL);
blk_wake_io_task(waiter);
}
@@ -283,7 +287,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
qc = submit_bio(&bio);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!READ_ONCE(bio.bi_private))
+ /* Refer to comments in blkdev_bio_end_io_simple() */
+ if (!smp_load_acquire(&bio.bi_private))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_poll(bdev_get_queue(bdev), qc, true))
@@ -353,7 +358,12 @@ static void blkdev_bio_end_io(struct bio *bio)
} else {
struct task_struct *waiter = dio->waiter;
- WRITE_ONCE(dio->waiter, NULL);
+ /*
+ * Paired with smp_load_acquire() in
+ * __blkdev_direct_IO() to ensure the order between
+ * dio->waiter and bio->bi_xxx
+ */
+ smp_store_release(&dio->waiter, NULL);
blk_wake_io_task(waiter);
}
}
@@ -478,7 +488,8 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
- if (!READ_ONCE(dio->waiter))
+ /* Refer to comments in blkdev_bio_end_io */
+ if (!smp_load_acquire(&dio->waiter))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
--
2.29.2
next reply other threads:[~2021-07-01 11:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 11:35 Hou Tao [this message]
2021-07-07 6:29 ` [PATCH] block: ensure the memory order between bi_private and bi_status Hou Tao
2021-07-13 1:14 ` Hou Tao
2021-07-15 7:01 ` Christoph Hellwig
2021-07-15 8:13 ` Peter Zijlstra
2021-07-16 9:02 ` Hou Tao
2021-07-16 10:19 ` Peter Zijlstra
2021-07-19 18:09 ` Paul E. McKenney
2021-07-19 18:16 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210701113537.582120-1-houtao1@huawei.com \
--to=houtao1@huawei.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.