All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
To: linux-kernel@vger.kernel.org
Cc: axboe@kernel.dk, tj@kernel.org,
	Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Subject: [PATCH] blk-throttle: fix race between submitter and throttler thread
Date: Thu, 13 May 2021 08:28:27 +0000	[thread overview]
Message-ID: <20210513082827.1818-1-dmtrmonakhov@yandex-team.ru> (raw)

Currently we call bio_set_flag(bio, BIO_THROTTLED) unconditionally
at the end of blk_throtl_bio w/o queue_lock. But once we drop queue_lock,
bio may already be processed by thottler thread, so both threads
may update bio->flags concurently

Dipite that race window is tiny, it happens in real life under heavy load.
It looks like follows:

SUBMITTER_THREAD (CPU1)                  THROTTLER_THREAD (CPU2)
 ->blk_throtl_bio
   ->throtl_add_bio_tg
(1)   bio_set_flag(bio, BIO_THROTTLED);
   spin_unlock_irq(q->queue_lock);
                                         ->blk_throtl_dispatch_work_fn
                                          (2)spin_lock_irq(q->queue_lock);
					   ->generic_make_request
					     ->blk_queue_split
                                               (3)bio_set_flag(bio, BIO_CHAINED)

(4) bio_set_flag(bio, BIO_THROTTLED);

Since bio->bi_flags is not atomic it will be cached on each CPU
CPU1 will cache it at the step (1), and changes from step(3) is not visiable,
so BIO_CHAINED flag will be lost and rewritten at step(4).
This result in ->bi_end_io() will be called multiple times once for each
chained bio and once for parent bio.

Bug#2: submit_bio_checks() call blkcg_bio_issue_init() for throttled bio,
but at this moment bio may be already be completed and freed by throttler thread

In order to fix both issues we should modify throttled bio under queue_lock only.

Fixes: 111be88398174 ("block-throttle: avoid double charge")
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>

diff --git a/block/bio.c b/block/bio.c
index 50e579088aca..96e6cf7793f2 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -277,6 +277,8 @@ static struct bio *__bio_chain_endio(struct bio *bio)
 {
 	struct bio *parent = bio->bi_private;
 
+	BUG_ON(!bio_flagged(parent, BIO_CHAIN));
+
 	if (bio->bi_status && !parent->bi_status)
 		parent->bi_status = bio->bi_status;
 	bio_put(bio);
diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..edc49e097ba1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -886,7 +886,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		create_task_io_context(current, GFP_ATOMIC, q->node);
 
 	if (blk_throtl_bio(bio)) {
-		blkcg_bio_issue_init(bio);
 		return false;
 	}
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index b1b22d863bdf..5808bbd7df26 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2170,10 +2170,19 @@ static void throtl_update_latency_buckets(struct throtl_data *td)
 			td->avg_buckets[WRITE][i].latency,
 			td->avg_buckets[WRITE][i].valid);
 }
+
+static inline void throtl_bio_skip_latency(struct bio *bio)
+{
+	bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY;
+}
 #else
 static inline void throtl_update_latency_buckets(struct throtl_data *td)
 {
 }
+
+static inline void throtl_bio_skip_latency(struct bio *bio)
+{
+}
 #endif
 
 bool blk_throtl_bio(struct bio *bio)
@@ -2187,20 +2196,26 @@ bool blk_throtl_bio(struct bio *bio)
 	bool throttled = false;
 	struct throtl_data *td = tg->td;
 
-	rcu_read_lock();
+	if (!td->track_bio_latency)
+		throtl_bio_skip_latency(bio);
 
 	/* see throtl_charge_bio() */
 	if (bio_flagged(bio, BIO_THROTTLED))
-		goto out;
+		return false;
 
+	rcu_read_lock();
 	if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) {
 		blkg_rwstat_add(&tg->stat_bytes, bio->bi_opf,
 				bio->bi_iter.bi_size);
 		blkg_rwstat_add(&tg->stat_ios, bio->bi_opf, 1);
 	}
 
-	if (!tg->has_rules[rw])
-		goto out;
+
+	if (!tg->has_rules[rw]) {
+		bio_set_flag(bio, BIO_THROTTLED);
+		rcu_read_unlock();
+		return false;
+	}
 
 	spin_lock_irq(&q->queue_lock);
 
@@ -2270,6 +2285,8 @@ bool blk_throtl_bio(struct bio *bio)
 
 	td->nr_queued[rw]++;
 	throtl_add_bio_tg(bio, qn, tg);
+	blkcg_bio_issue_init(bio);
+	throtl_bio_skip_latency(bio);
 	throttled = true;
 
 	/*
@@ -2284,15 +2301,15 @@ bool blk_throtl_bio(struct bio *bio)
 	}
 
 out_unlock:
+	if (!bio_flagged(bio, BIO_THROTTLED))
+		bio_set_flag(bio, BIO_THROTTLED);
+	/*
+	 * Once we drop ->queue_lock it is unsafe to touch current bio,
+	 * because it may be already handled by throttler thread.
+	 */
 	spin_unlock_irq(&q->queue_lock);
-out:
-	bio_set_flag(bio, BIO_THROTTLED);
-
-#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
-	if (throttled || !td->track_bio_latency)
-		bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY;
-#endif
 	rcu_read_unlock();
+
 	return throttled;
 }
 
-- 
2.7.4


             reply	other threads:[~2021-05-13  8:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13  8:28 Dmitry Monakhov [this message]
2021-05-20 15:00 ` [PATCH] blk-throttle: fix race between submitter and throttler thread Tejun Heo
2021-05-20 17:00   ` Dmitry Monakhov
2021-05-20 17:07     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210513082827.1818-1-dmtrmonakhov@yandex-team.ru \
    --to=dmtrmonakhov@yandex-team.ru \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.