linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: brookxu <brookxu.cn@gmail.com>
Cc: axboe@kernel.dk, tj@kernel.org, linux-block@vger.kernel.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 00/11] bfq: introduce bfq.ioprio for cgroup
Date: Sun, 21 Mar 2021 12:04:53 +0100	[thread overview]
Message-ID: <F64D5CC8-E650-4AE6-8452-7FA0C1976271@linaro.org> (raw)
In-Reply-To: <cover.1615517202.git.brookxu@tencent.com>



> Il giorno 12 mar 2021, alle ore 12:08, brookxu <brookxu.cn@gmail.com> ha scritto:
> 
> From: Chunguang Xu <brookxu@tencent.com>
> 

Hi Chunguang,

> Tasks in the production environment can be roughly divided into
> three categories: emergency tasks, ordinary tasks and offline
> tasks. Emergency tasks need to be scheduled in real time, such
> as system agents. Offline tasks do not need to guarantee QoS,
> but can improve system resource utilization during system idle
> periods, such as background tasks. The above requirements need
> to achieve IO preemption. At present, we can use weights to
> simulate IO preemption, but since weights are more of a shared
> concept, they cannot be simulated well. For example, the weights
> of emergency tasks and ordinary tasks cannot be determined well,
> offline tasks (with the same weight) actually occupy different
> resources on disks with different performance, and the tail
> latency caused by offline tasks cannot be well controlled. Using
> ioprio's concept of preemption, we can solve the above problems
> very well. Since ioprio will eventually be converted to weight,
> using ioprio alone can also achieve weight isolation within the
> same class. But we can still use bfq.weight to control resource,
> achieving better IO Qos control.
> 
> However, currently the class of bfq_group is always be class, and
> the ioprio class of the task can only be reflected in a single
> cgroup. We cannot guarantee that real-time tasks in a cgroup are
> scheduled in time. Therefore, we introduce bfq.ioprio, which
> allows us to configure ioprio class for cgroup. In this way, we
> can ensure that the real-time tasks of a cgroup can be scheduled
> in time. Similarly, the processing of offline task groups can
> also be simpler.
> 

I find this contribution very interesting.  Anyway, given the
relevance of such a contribution, I'd like to hear from relevant
people (Jens, Tejun, ...?), before revising individual patches.

Yet I already have a general question.  How does this mechanism comply
with per-process ioprios and ioprio classes?  For example, what
happens if a process belongs to BE-class group according to your
mechanism, but to a RT class according to its ioprio?  Does the
pre-group class dominate the per-process class?  Is all clean and
predictable?

> The bfq.ioprio interface now is available for cgroup v1 and cgroup
> v2. Users can configure the ioprio for cgroup through this interface,
> as shown below:
> 
> echo "1 2"> blkio.bfq.ioprio

Wouldn't it be nicer to have acronyms for classes (RT, BE, IDLE),
instead of numbers?

Thank you very much for this improvement proposal,
Paolo

> 
> The above two values respectively represent the values of ioprio
> class and ioprio for cgroup. The ioprio of tasks within the cgroup
> is uniformly equal to the ioprio of the cgroup. If the ioprio of
> the cgroup is disabled, the ioprio of the task remains the same,
> usually from io_context.
> 
> When testing, using fio and fio_generate_plots we can clearly see
> that the IO delay of the task satisfies RT> BE> IDLE. When RT is
> running, BE and IDLE are guaranteed minimum bandwidth. When used
> with bfq.weight, we can also isolate the resource within the same
> class.
> 
> The test process is as follows:
> # prepare data disk
> mount /dev/sdb /data1
> 
> # create cgroup v1 hierarchy
> cd /sys/fs/cgroup/blkio
> mkdir rt be idle
> echo "1 0" > rt/blkio.bfq.ioprio
> echo "2 0" > be/blkio.bfq.ioprio
> echo "3 0" > idle/blkio.bfq.ioprio
> 
> # run fio test
> fio fio.ini
> 
> # generate svg graph
> fio_generate_plots res
> 
> The contents of fio.ini are as follows:
> [global]
> ioengine=libaio
> group_reporting=1
> log_avg_msec=500
> direct=1
> time_based=1
> iodepth=16
> size=100M
> rw=write
> bs=1M
> [rt]
> name=rt
> write_bw_log=rt
> write_lat_log=rt
> write_iops_log=rt
> filename=/data1/rt.bin
> cgroup=rt
> runtime=30s
> nice=-10
> [be]
> name=be
> new_group
> write_bw_log=be
> write_lat_log=be
> write_iops_log=be
> filename=/data1/be.bin
> cgroup=be
> runtime=60s
> [idle]
> name=idle
> new_group
> write_bw_log=idle
> write_lat_log=idle
> write_iops_log=idle
> filename=/data1/idle.bin
> cgroup=idle
> runtime=90s
> 
> V2:
> 1. Optmise bfq_select_next_class().
> 2. Introduce bfq_group [] to track the number of groups for each CLASS.
> 3. Optimse IO injection, EMQ and Idle mechanism for CLASS_RT.
> 
> Chunguang Xu (11):
>  bfq: introduce bfq_entity_to_bfqg helper method
>  bfq: limit the IO depth of idle_class to 1
>  bfq: keep the minimun bandwidth for be_class
>  bfq: expire other class if CLASS_RT is waiting
>  bfq: optimse IO injection for CLASS_RT
>  bfq: disallow idle if CLASS_RT waiting for service
>  bfq: disallow merge CLASS_RT with other class
>  bfq: introduce bfq.ioprio for cgroup
>  bfq: convert the type of bfq_group.bfqd to bfq_data*
>  bfq: remove unnecessary initialization logic
>  bfq: optimize the calculation of bfq_weight_to_ioprio()
> 
> block/bfq-cgroup.c  |  99 +++++++++++++++++++++++++++++++----
> block/bfq-iosched.c |  47 ++++++++++++++---
> block/bfq-iosched.h |  28 ++++++++--
> block/bfq-wf2q.c    | 124 +++++++++++++++++++++++++++++++++-----------
> 4 files changed, 244 insertions(+), 54 deletions(-)
> 
> -- 
> 2.30.0
> 


  parent reply	other threads:[~2021-03-21 11:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12 11:08 [RFC PATCH v2 00/11] bfq: introduce bfq.ioprio for cgroup brookxu
     [not found] ` <cover.1615527324.git.brookxu@tencent.com>
2021-03-12 11:08   ` [RFC PATCH v2 01/11] bfq: introduce bfq_entity_to_bfqg helper method brookxu
2021-03-12 11:08   ` [RFC PATCH v2 02/11] bfq: convert the type of bfq_group.bfqd to bfq_data* brookxu
2021-03-12 11:08   ` [RFC PATCH v2 03/11] bfq: introduce bfq.ioprio for cgroup brookxu
2021-03-12 11:08   ` [RFC PATCH v2 04/11] bfq: limit the IO depth of idle_class to 1 brookxu
2021-03-12 11:08   ` [RFC PATCH v2 05/11] bfq: keep the minimun bandwidth for be_class brookxu
2021-03-12 11:08   ` [RFC PATCH v2 06/11] bfq: expire other class if CLASS_RT is waiting brookxu
2021-03-12 11:08   ` [RFC PATCH v2 07/11] bfq: optimse IO injection for CLASS_RT brookxu
2021-03-12 11:08   ` [RFC PATCH v2 08/11] bfq: disallow idle if CLASS_RT waiting for service brookxu
2021-03-12 11:08   ` [RFC PATCH v2 09/11] bfq: disallow merge CLASS_RT with other class brookxu
2021-03-12 11:08   ` [RFC PATCH v2 10/11] bfq: remove unnecessary initialization logic brookxu
2021-03-12 11:08   ` [RFC PATCH v2 11/11] bfq: optimize the calculation of bfq_weight_to_ioprio() brookxu
2021-03-21 11:04 ` Paolo Valente [this message]
2021-03-22  5:44   ` [RFC PATCH v2 00/11] bfq: introduce bfq.ioprio for cgroup brookxu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F64D5CC8-E650-4AE6-8452-7FA0C1976271@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=axboe@kernel.dk \
    --cc=brookxu.cn@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).