All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: axboe@kernel.dk
Subject: [PATCH] block: fix io hung by block throttle
Date: Wed, 14 Apr 2021 14:18:30 -0700	[thread overview]
Message-ID: <20210414211830.5720-1-junxiao.bi@oracle.com> (raw)

There is a race bug which can cause io hung when multiple processes
run parallel in rq_qos_wait().
Let assume there were 4 processes P1/P2/P3/P4, P1/P2 were at the entry
of rq_qos_wait, and P3/P4 were waiting for io done, 2 io were inflight,
the inflight io limit was 2. See race below.

void rq_qos_wait()
{
	...
    bool has_sleeper;

	>>>> P3/P4 were in sleeper list, has_sleeper was true for both P1 and P2.
    has_sleeper = wq_has_sleeper(&rqw->wait);
    if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
        return;

	>>>> 2 inflight io done, P3/P4 were waken up to issue 2 new io.
	>>>> 2 new io done, no inflight io.

	>>>> P1/P2 were added to the sleeper list, 2 entry in the list
    prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);

	>>>> P1/P2 were in the sleeper list, has_sleeper was true for P1/P2.
    has_sleeper = !wq_has_single_sleeper(&rqw->wait);
    do {
        /* The memory barrier in set_task_state saves us here. */
        if (data.got_token)
            break;
        if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
            finish_wait(&rqw->wait, &data.wq);

            /*
             * We raced with wbt_wake_function() getting a token,
             * which means we now have two. Put our local token
             * and wake anyone else potentially waiting for one.
             */
            smp_rmb();
            if (data.got_token)
                cleanup_cb(rqw, private_data);
            break;
        }

	>>>> P1/P2 hung here forever. New io requests will also hung here.
        io_schedule();
        has_sleeper = true;
        set_current_state(TASK_UNINTERRUPTIBLE);
    } while (1);
    finish_wait(&rqw->wait, &data.wq);
}

Cc: stable@vger.kernel.org
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 block/blk-rq-qos.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 656460636ad3..04d888c99bc0 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -260,19 +260,17 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 		.cb = acquire_inflight_cb,
 		.private_data = private_data,
 	};
-	bool has_sleeper;
 
-	has_sleeper = wq_has_sleeper(&rqw->wait);
-	if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
+	if (!wq_has_sleeper(&rqw->wait)
+		&& acquire_inflight_cb(rqw, private_data))
 		return;
 
 	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
-	has_sleeper = !wq_has_single_sleeper(&rqw->wait);
 	do {
 		/* The memory barrier in set_task_state saves us here. */
 		if (data.got_token)
 			break;
-		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
+		if (acquire_inflight_cb(rqw, private_data)) {
 			finish_wait(&rqw->wait, &data.wq);
 
 			/*
@@ -286,7 +284,6 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 			break;
 		}
 		io_schedule();
-		has_sleeper = true;
 		set_current_state(TASK_UNINTERRUPTIBLE);
 	} while (1);
 	finish_wait(&rqw->wait, &data.wq);
-- 
2.24.3 (Apple Git-128)


             reply	other threads:[~2021-04-14 21:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 21:18 Junxiao Bi [this message]
     [not found] ` <20210417101008.4132-1-hdanton@sina.com>
2021-04-17 21:37   ` [PATCH] block: fix io hung by block throttle Junxiao Bi
     [not found]   ` <20210418123342.13740-1-hdanton@sina.com>
2021-04-19  6:09     ` Junxiao Bi
2021-04-19 16:39       ` Junxiao Bi
     [not found] ` <20210415041153.577-1-hdanton@sina.com>
2021-04-15  5:01   ` Junxiao Bi
2021-04-21 21:28   ` Junxiao Bi
2021-04-23  2:55 ` [block] 658f2fb7d2: fxmark.hdd_f2fs_dbench_client_72_directio.works/sec -21.4% regression kernel test robot
2021-04-23  2:55   ` kernel test robot
2021-04-23  5:26   ` Junxiao Bi
2021-04-23  5:26     ` Junxiao Bi
2021-04-28 21:08   ` Junxiao Bi
2021-04-28 21:08     ` Junxiao Bi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210414211830.5720-1-junxiao.bi@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.