linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-iolatency: fix STS_AGAIN handling
@ 2019-07-05 20:32 Dennis Zhou
  2019-07-05 20:46 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Dennis Zhou @ 2019-07-05 20:32 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik
  Cc: kernel-team, linux-block, linux-kernel, Dennis Zhou

The iolatency controller is based on rq_qos. It increments on
rq_qos_throttle() and decrements on either rq_qos_cleanup() or
rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where
blk_mq_make_request() may call both rq_qos_cleanup() and
rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
double decrement.

The above works upstream as the only way we can get STS_AGAIN is from
blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
problem as bio_endio() skipping only happens on reserved tag allocation
failures which can only be caused by driver bugs and already triggers
WARN.

However, the fix creates a not so great dependency on how STS_AGAIN can
be propagated. Internally, we (Facebook) carry a patch that kills read
ahead if a cgroup is io congested or a fatal signal is pending. This
combined with chained bios progagate their bi_status to the parent is
not already set can can cause the parent bio to not clean up properly
even though it was successful. This consequently leaks the inflight
counter and can hang all IOs under that blkg.

To nip the adverse interaction early, this removes the rq_qos_cleanup()
callback in iolatency in favor of cleaning up always on the
rq_qos_done_bio() path.

Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios")
Debugged-by: Tejun Heo <tj@kernel.org>
Debugged-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 block/blk-iolatency.c | 29 +++--------------------------
 1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index e8859350ab6e..c956eebf2d97 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -600,10 +600,6 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 	if (!blkg || !bio_flagged(bio, BIO_TRACKED))
 		return;
 
-	/* We didn't actually submit this bio, don't account it. */
-	if (bio->bi_status == BLK_STS_AGAIN)
-		return;
-
 	iolat = blkg_to_lat(bio->bi_blkg);
 	if (!iolat)
 		return;
@@ -622,6 +618,9 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 
 		inflight = atomic_dec_return(&rqw->inflight);
 		WARN_ON_ONCE(inflight < 0);
+		/* We didn't actually submit this bio, don't account for it. */
+		if (bio->bi_status == BLK_STS_AGAIN)
+			goto next;
 		if (iolat->min_lat_nsec == 0)
 			goto next;
 		iolatency_record_time(iolat, &bio->bi_issue, now,
@@ -639,27 +638,6 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
 	}
 }
 
-static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
-{
-	struct blkcg_gq *blkg;
-
-	blkg = bio->bi_blkg;
-	while (blkg && blkg->parent) {
-		struct rq_wait *rqw;
-		struct iolatency_grp *iolat;
-
-		iolat = blkg_to_lat(blkg);
-		if (!iolat)
-			goto next;
-
-		rqw = &iolat->rq_wait;
-		atomic_dec(&rqw->inflight);
-		wake_up(&rqw->wait);
-next:
-		blkg = blkg->parent;
-	}
-}
-
 static void blkcg_iolatency_exit(struct rq_qos *rqos)
 {
 	struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
@@ -671,7 +649,6 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
 
 static struct rq_qos_ops blkcg_iolatency_ops = {
 	.throttle = blkcg_iolatency_throttle,
-	.cleanup = blkcg_iolatency_cleanup,
 	.done_bio = blkcg_iolatency_done_bio,
 	.exit = blkcg_iolatency_exit,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] blk-iolatency: fix STS_AGAIN handling
  2019-07-05 20:32 [PATCH] blk-iolatency: fix STS_AGAIN handling Dennis Zhou
@ 2019-07-05 20:46 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2019-07-05 20:46 UTC (permalink / raw)
  To: Dennis Zhou, Josef Bacik; +Cc: kernel-team, linux-block, linux-kernel

On 7/5/19 2:32 PM, Dennis Zhou wrote:
> The iolatency controller is based on rq_qos. It increments on
> rq_qos_throttle() and decrements on either rq_qos_cleanup() or
> rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where
> blk_mq_make_request() may call both rq_qos_cleanup() and
> rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
> double decrement.
> 
> The above works upstream as the only way we can get STS_AGAIN is from
> blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
> problem as bio_endio() skipping only happens on reserved tag allocation
> failures which can only be caused by driver bugs and already triggers
> WARN.
> 
> However, the fix creates a not so great dependency on how STS_AGAIN can
> be propagated. Internally, we (Facebook) carry a patch that kills read
> ahead if a cgroup is io congested or a fatal signal is pending. This
> combined with chained bios progagate their bi_status to the parent is
> not already set can can cause the parent bio to not clean up properly
> even though it was successful. This consequently leaks the inflight
> counter and can hang all IOs under that blkg.
> 
> To nip the adverse interaction early, this removes the rq_qos_cleanup()
> callback in iolatency in favor of cleaning up always on the
> rq_qos_done_bio() path.
> 
> Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios")
> Debugged-by: Tejun Heo <tj@kernel.org>
> Debugged-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
>   block/blk-iolatency.c | 29 +++--------------------------
>   1 file changed, 3 insertions(+), 26 deletions(-)
> 
> diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
> index e8859350ab6e..c956eebf2d97 100644
> --- a/block/blk-iolatency.c
> +++ b/block/blk-iolatency.c
> @@ -600,10 +600,6 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
>   	if (!blkg || !bio_flagged(bio, BIO_TRACKED))
>   		return;
>   
> -	/* We didn't actually submit this bio, don't account it. */
> -	if (bio->bi_status == BLK_STS_AGAIN)
> -		return;
> -
>   	iolat = blkg_to_lat(bio->bi_blkg);
>   	if (!iolat)
>   		return;
> @@ -622,6 +618,9 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
>   
>   		inflight = atomic_dec_return(&rqw->inflight);
>   		WARN_ON_ONCE(inflight < 0);
> +		/* We didn't actually submit this bio, don't account for it. */
> +		if (bio->bi_status == BLK_STS_AGAIN)
> +			goto next;
>   		if (iolat->min_lat_nsec == 0)
>   			goto next;
>   		iolatency_record_time(iolat, &bio->bi_issue, now,

Patch in general looks fine to me, but let's get rid of this next label,
it's pretty silly. Only one use of it, why not just make it a nested if?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-07-05 20:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05 20:32 [PATCH] blk-iolatency: fix STS_AGAIN handling Dennis Zhou
2019-07-05 20:46 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).