All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, Yu Kuai <yukuai3@huawei.com>,
	Saravanan D <saravanand@fb.com>,
	Christopher Obbard <chris.obbard@collabora.com>
Subject: Re: [PATCH block-5.17] fix rq-qos breakage from skipping rq_qos_done_bio()
Date: Mon, 14 Mar 2022 16:11:44 +0800	[thread overview]
Message-ID: <Yi74wAHBvU+8QGrP@T590> (raw)
In-Reply-To: <Yi7rdrzQEHjJLGKB@slm.duckdns.org>

On Sun, Mar 13, 2022 at 09:15:02PM -1000, Tejun Heo wrote:
> a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't
> tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set.
> While this fixed a potential oops, it also broke blk-iocost by skipping the
> done_bio callback for merged bios.
> 
> Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(),
> rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED
> distinguishing the former from the latter. rq_qos_done_bio() is not called
> for bios which wenth through rq_qos_merge(). This royally confuses
> blk-iocost as the merged bios never finish and are considered perpetually
> in-flight.
> 
> One reliably reproducible failure mode is an intermediate cgroup geting
> stuck active preventing its children from being activated due to the
> leaf-only rule, leading to loss of control. The following is from
> resctl-bench protection scenario which emulates isolating a web server like
> workload from a memory bomb run on an iocost configuration which should
> yield a reasonable level of protection.
> 
>   # cat /sys/block/nvme2n1/device/model
>   Samsung SSD 970 PRO 512GB               
>   # cat /sys/fs/cgroup/io.cost.model
>   259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025
>   # cat /sys/fs/cgroup/io.cost.qos
>   259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00
>   # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
>   ...
>   Memory Hog Summary
>   ==================
> 
>   IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m
>               W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m
> 
>   Isolation and Request Latency Impact Distributions:
> 
>                 min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
>   isol%       15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 
>   lat-imp%        0     0     0     0     0  4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 
> 
>   Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96%
> 
> The isolation result of 58.12% is close to what this device would show
> without any IO control.
> 
> Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and
> calling rq_qos_done_bio() on them too. For consistency and clarity, rename
> BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into
> rq_qos_done_bio() so that it's next to the code paths that set the flags.
> 
> With the patch applied, the above same benchmark shows:
> 
>   # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
>   ...
>   Memory Hog Summary
>   ==================
> 
>   IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m
>               W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m
> 
>   Isolation and Request Latency Impact Distributions:
> 
>                 min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
>   isol%       84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42  2.81 
>   lat-imp%        0     0     0     0     0  2.81  5.73 11.11 13.92 17.53 22.61  4.10  4.68 
> 
>   Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0%
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked")
> Cc: stable@vger.kernel.org # v5.15+
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Yu Kuai <yukuai3@huawei.com>

Looks fine since rq always holds one .q_usage_counter in case of merge:

Reviewed-by: Ming Lei <ming.lei@redhat.com>



Thanks,
Ming


  parent reply	other threads:[~2022-03-14  8:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14  7:15 [PATCH block-5.17] fix rq-qos breakage from skipping rq_qos_done_bio() Tejun Heo
2022-03-14  7:58 ` Tejun Heo
2022-03-14  8:11 ` Ming Lei [this message]
2022-03-14 20:23 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yi74wAHBvU+8QGrP@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=chris.obbard@collabora.com \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=saravanand@fb.com \
    --cc=tj@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.