All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>
Cc: <linux-block@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<houtao1@huawei.com>, <yukuai3@huawei.com>
Subject: [PATCH] block: ensure the memory order between bi_private and bi_status
Date: Thu, 1 Jul 2021 19:35:37 +0800	[thread overview]
Message-ID: <20210701113537.582120-1-houtao1@huawei.com> (raw)

When running stress test on null_blk under linux-4.19.y, the following
warning is reported:

  percpu_ref_switch_to_atomic_rcu: percpu ref (css_release) <= 0 (-3) after switching to atomic

The cause is that css_put() is invoked twice on the same bio as shown below:

CPU 1:                         CPU 2:

// IO completion kworker       // IO submit thread
                               __blkdev_direct_IO_simple
                                 submit_bio

bio_endio
  bio_uninit(bio)
    css_put(bi_css)
    bi_css = NULL
                               set_current_state(TASK_UNINTERRUPTIBLE)
  bio->bi_end_io
    blkdev_bio_end_io_simple
      bio->bi_private = NULL
                               // bi_private is NULL
                               READ_ONCE(bio->bi_private)
        wake_up_process
          smp_mb__after_spinlock

                               bio_unint(bio)
                                 // read bi_css as no-NULL
                                 // so call css_put() again
                                 css_put(bi_css)

Because there is no memory barriers between the reading and the writing of
bi_private and bi_css, so reading bi_private as NULL can not guarantee
bi_css will also be NULL on weak-memory model host (e.g, ARM64).

For the latest kernel source, css_put() has been removed from bio_unint(),
but the memory-order problem still exists, because the order between
bio->bi_private and {bi_status|bi_blkg} is also assumed in
__blkdev_direct_IO_simple(). It is reproducible that
__blkdev_direct_IO_simple() may read bi_status as 0 event if
bi_status is set as an errno in req_bio_endio().

In __blkdev_direct_IO(), the memory order between dio->waiter and
dio->bio.bi_status is not guaranteed neither. Until now it is unable to
reproduce it, maybe because dio->waiter and dio->bio.bi_status are
in the same cache-line. But it is better to add guarantee for memory
order.

Fixing it by using smp_load_acquire() & smp_store_release() to guarantee
the order between {bio->bi_private|dio->waiter} and {bi_status|bi_blkg}.

Fixes: 189ce2b9dcc3 ("block: fast-path for small and simple direct I/O requests")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/block_dev.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index eb34f5c357cf..a602c6315b0b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -224,7 +224,11 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
 {
 	struct task_struct *waiter = bio->bi_private;
 
-	WRITE_ONCE(bio->bi_private, NULL);
+	/*
+	 * Paired with smp_load_acquire in __blkdev_direct_IO_simple()
+	 * to ensure the order between bi_private and bi_xxx
+	 */
+	smp_store_release(&bio->bi_private, NULL);
 	blk_wake_io_task(waiter);
 }
 
@@ -283,7 +287,8 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 	qc = submit_bio(&bio);
 	for (;;) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (!READ_ONCE(bio.bi_private))
+		/* Refer to comments in blkdev_bio_end_io_simple() */
+		if (!smp_load_acquire(&bio.bi_private))
 			break;
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
 		    !blk_poll(bdev_get_queue(bdev), qc, true))
@@ -353,7 +358,12 @@ static void blkdev_bio_end_io(struct bio *bio)
 		} else {
 			struct task_struct *waiter = dio->waiter;
 
-			WRITE_ONCE(dio->waiter, NULL);
+			/*
+			 * Paired with smp_load_acquire() in
+			 * __blkdev_direct_IO() to ensure the order between
+			 * dio->waiter and bio->bi_xxx
+			 */
+			smp_store_release(&dio->waiter, NULL);
 			blk_wake_io_task(waiter);
 		}
 	}
@@ -478,7 +488,8 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 	for (;;) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (!READ_ONCE(dio->waiter))
+		/* Refer to comments in blkdev_bio_end_io */
+		if (!smp_load_acquire(&dio->waiter))
 			break;
 
 		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-- 
2.29.2


             reply	other threads:[~2021-07-01 11:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01 11:35 Hou Tao [this message]
2021-07-07  6:29 ` [PATCH] block: ensure the memory order between bi_private and bi_status Hou Tao
2021-07-13  1:14   ` Hou Tao
2021-07-15  7:01 ` Christoph Hellwig
2021-07-15  8:13   ` Peter Zijlstra
2021-07-16  9:02     ` Hou Tao
2021-07-16 10:19       ` Peter Zijlstra
2021-07-19 18:09         ` Paul E. McKenney
2021-07-19 18:16   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210701113537.582120-1-houtao1@huawei.com \
    --to=houtao1@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.