linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	Sasha Levin <sashal@kernel.org>,
	linux-block@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 06/29] blk-iolatency: fix STS_AGAIN handling
Date: Thu, 29 Aug 2019 06:49:46 -0400	[thread overview]
Message-ID: <20190829105009.2265-6-sashal@kernel.org> (raw)
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Dennis Zhou <dennis@kernel.org>

[ Upstream commit c9b3007feca018d3f7061f5d5a14cb00766ffe9b ]

The iolatency controller is based on rq_qos. It increments on
rq_qos_throttle() and decrements on either rq_qos_cleanup() or
rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where
blk_mq_make_request() may call both rq_qos_cleanup() and
rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
double decrement.

The above works upstream as the only way we can get STS_AGAIN is from
blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
problem as bio_endio() skipping only happens on reserved tag allocation
failures which can only be caused by driver bugs and already triggers
WARN.

However, the fix creates a not so great dependency on how STS_AGAIN can
be propagated. Internally, we (Facebook) carry a patch that kills read
ahead if a cgroup is io congested or a fatal signal is pending. This
combined with chained bios progagate their bi_status to the parent is
not already set can can cause the parent bio to not clean up properly
even though it was successful. This consequently leaks the inflight
counter and can hang all IOs under that blkg.

To nip the adverse interaction early, this removes the rq_qos_cleanup()
callback in iolatency in favor of cleaning up always on the
rq_qos_done_bio() path.

Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios")
Debugged-by: Tejun Heo <tj@kernel.org>
Debugged-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/blk-iolatency.c | 51 ++++++++++++-------------------------------
 1 file changed, 14 insertions(+), 37 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 84ecdab41b691..0529e94a20f7f 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -566,10 +566,6 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 	if (!blkg)
 		return;
 
-	/* We didn't actually submit this bio, don't account it. */
-	if (bio->bi_status == BLK_STS_AGAIN)
-		return;
-
 	iolat = blkg_to_lat(bio->bi_blkg);
 	if (!iolat)
 		return;
@@ -588,40 +584,22 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 
 		inflight = atomic_dec_return(&rqw->inflight);
 		WARN_ON_ONCE(inflight < 0);
-		if (iolat->min_lat_nsec == 0)
-			goto next;
-		iolatency_record_time(iolat, &bio->bi_issue, now,
-				      issue_as_root);
-		window_start = atomic64_read(&iolat->window_start);
-		if (now > window_start &&
-		    (now - window_start) >= iolat->cur_win_nsec) {
-			if (atomic64_cmpxchg(&iolat->window_start,
-					window_start, now) == window_start)
-				iolatency_check_latencies(iolat, now);
+		/*
+		 * If bi_status is BLK_STS_AGAIN, the bio wasn't actually
+		 * submitted, so do not account for it.
+		 */
+		if (iolat->min_lat_nsec && bio->bi_status != BLK_STS_AGAIN) {
+			iolatency_record_time(iolat, &bio->bi_issue, now,
+					      issue_as_root);
+			window_start = atomic64_read(&iolat->window_start);
+			if (now > window_start &&
+			    (now - window_start) >= iolat->cur_win_nsec) {
+				if (atomic64_cmpxchg(&iolat->window_start,
+					     window_start, now) == window_start)
+					iolatency_check_latencies(iolat, now);
+			}
 		}
-next:
-		wake_up(&rqw->wait);
-		blkg = blkg->parent;
-	}
-}
-
-static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
-{
-	struct blkcg_gq *blkg;
-
-	blkg = bio->bi_blkg;
-	while (blkg && blkg->parent) {
-		struct rq_wait *rqw;
-		struct iolatency_grp *iolat;
-
-		iolat = blkg_to_lat(blkg);
-		if (!iolat)
-			goto next;
-
-		rqw = &iolat->rq_wait;
-		atomic_dec(&rqw->inflight);
 		wake_up(&rqw->wait);
-next:
 		blkg = blkg->parent;
 	}
 }
@@ -637,7 +615,6 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
 
 static struct rq_qos_ops blkcg_iolatency_ops = {
 	.throttle = blkcg_iolatency_throttle,
-	.cleanup = blkcg_iolatency_cleanup,
 	.done_bio = blkcg_iolatency_done_bio,
 	.exit = blkcg_iolatency_exit,
 };
-- 
2.20.1


  parent reply	other threads:[~2019-08-29 10:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29 10:49 [PATCH AUTOSEL 4.19 01/29] hv_sock: Fix hang when a connection is closed Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 02/29] Revert "dm bufio: fix deadlock with loop device" Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 03/29] kprobes: Fix potential deadlock in kprobe_optimizer() Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 04/29] ALSA: line6: Fix memory leak at line6_init_pcm() error path Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 05/29] Blk-iolatency: warn on negative inflight IO counter Sasha Levin
2019-08-29 10:49 ` Sasha Levin [this message]
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 07/29] {nl,mac}80211: fix interface combinations on crypto controlled devices Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 08/29] timekeeping: Use proper ktime_add when adding nsecs in coarse offset Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 09/29] selftests: fib_rule_tests: use pre-defined DEV_ADDR Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 10/29] x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace() Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 11/29] binder: take read mode of mmap_sem in binder_alloc_free_page() Sasha Levin
2019-08-29 15:13   ` Tyler Hicks
2019-08-30  6:29     ` Greg Kroah-Hartman
2019-08-30  7:30       ` Tyler Hicks
2019-09-02 15:54         ` Greg Kroah-Hartman
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 12/29] powerpc/64: mark start_here_multiplatform as __ref Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 13/29] media: stm32-dcmi: fix irq = 0 case Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 14/29] HID: input: fix a4tech horizontal wheel custom usage Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 15/29] netfilter: nf_tables: use-after-free in failing rule with bound set Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 16/29] userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 17/29] arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64 Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 18/29] mac80211: fix possible sta leak Sasha Levin
2019-08-29 10:49 ` [PATCH AUTOSEL 4.19 19/29] scripts/decode_stacktrace: match basepath using shell prefix operator, not regex Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 20/29] KVM: arm/arm64: Only skip MMIO insn once Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 21/29] netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 22/29] netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 23/29] ALSA: usb-audio: Check mixer unit bitmap yet more strictly Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 24/29] riscv: remove unused variable in ftrace Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 25/29] nvme-fc: use separate work queue to avoid warning Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 26/29] clk: s2mps11: Add used attribute to s2mps11_dt_match Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 27/29] remoteproc: qcom: q6v5: shore up resource probe handling Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 28/29] modules: always page-align module section allocations Sasha Levin
2019-08-29 10:50 ` [PATCH AUTOSEL 4.19 29/29] kernel/module: Fix mem leak in module_add_modinfo_attrs Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190829105009.2265-6-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dennis@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).