From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>, Jaegeuk Kim <jaegeuk@kernel.org>,
Hannes Reinecke <hare@suse.de>, Ming Lei <ming.lei@redhat.com>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH 04/14] block: Introduce the ioprio rq-qos policy
Date: Wed, 9 Jun 2021 04:40:51 +0000 [thread overview]
Message-ID: <DM6PR04MB708135D0C71D2042E6674E23E7369@DM6PR04MB7081.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20210608230703.19510-5-bvanassche@acm.org
On 2021/06/09 8:07, Bart Van Assche wrote:
> Introduce an rq-qos policy that assigns an I/O priority to requests based
> on blk-cgroup configuration settings. This policy has the following
> advantages over the ioprio_set() system call:
> - This policy is cgroup based so it has all the advantages of cgroups.
> - While ioprio_set() does not affect page cache writeback I/O, this rq-qos
> controller affects page cache writeback I/O for filesystems that support
> assiociating a cgroup with writeback I/O. See also
> Documentation/admin-guide/cgroup-v2.rst.
>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 26 +++
> block/Kconfig | 9 +
> block/Makefile | 1 +
> block/blk-cgroup.c | 5 +
> block/blk-ioprio.c | 236 ++++++++++++++++++++++++
> block/blk-ioprio.h | 19 ++
> block/blk-rq-qos.c | 2 +
> block/blk-rq-qos.h | 1 +
> 8 files changed, 299 insertions(+)
> create mode 100644 block/blk-ioprio.c
> create mode 100644 block/blk-ioprio.h
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index b1e81aa8598a..cf8c8073b474 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -56,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
> 5-3-3. IO Latency
> 5-3-3-1. How IO Latency Throttling Works
> 5-3-3-2. IO Latency Interface Files
> + 5-3-4. IO Priority
> 5-4. PID
> 5-4-1. PID Interface Files
> 5-5. Cpuset
> @@ -1866,6 +1867,31 @@ IO Latency Interface Files
> duration of time between evaluation events. Windows only elapse
> with IO activity. Idle periods extend the most recent window.
>
> +IO Priority
> +~~~~~~~~~~~
> +
> +A single attribute controls the behavior of the I/O priority cgroup policy,
> +namely the blkio.prio.class attribute. The following values are accepted for
> +that attribute:
> +
> + 0
> + Do not modify the I/O priority class.
> +
> + 1
> + For requests that do not have an I/O priority class (IOPRIO_CLASS_NONE),
> + change the I/O priority class into 1 (IOPRIO_CLASS_RT). Do not modify
> + the I/O priority class of other requests.
> +
> + 2
> + For requests that do not have an I/O priority class, change the I/O
> + priority class into IOPRIO_CLASS_BE. For requests that have the
> + priority class RT, change it into BE (2). Do not modify
> + the I/O priority class of requests that have priority 3 (IDLE).
> +
> + 3
> + Change the I/O priority class of all requests into 3 (IDLE),
> + the lowest I/O priority class.
> +
> PID
> ---
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 6685578b2a20..319816058298 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -162,6 +162,15 @@ config BLK_CGROUP_IOCOST
> distributes IO capacity between different groups based on
> their share of the overall weight distribution.
>
> +config BLK_CGROUP_IOPRIO
> + bool "Cgroup I/O controller for assigning I/O priority"
> + depends on BLK_CGROUP
> + help
> + Enable the .prio interface for assigning an I/O priority to requests.
> + The I/O priority affects the order in which an I/O scheduler and block
> + devices process requests. Only some I/O schedulers and some block
> + devices support I/O priorities.
> +
> config BLK_DEBUG_FS
> bool "Block layer debugging information in debugfs"
> default y
> diff --git a/block/Makefile b/block/Makefile
> index 8d841f5f986f..af3d044abaf1 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -17,6 +17,7 @@ obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
> obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
> obj-$(CONFIG_BLK_CGROUP_RWSTAT) += blk-cgroup-rwstat.o
> obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
> +obj-$(CONFIG_BLK_CGROUP_IOPRIO) += blk-ioprio.o
> obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o
> obj-$(CONFIG_BLK_CGROUP_IOCOST) += blk-iocost.o
> obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 3b0f6efaa2b6..7b06a5fa3cac 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -31,6 +31,7 @@
> #include <linux/tracehook.h>
> #include <linux/psi.h>
> #include "blk.h"
> +#include "blk-ioprio.h"
>
> /*
> * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
> @@ -1187,6 +1188,10 @@ int blkcg_init_queue(struct request_queue *q)
> if (ret)
> goto err_destroy_all;
>
> + ret = blk_ioprio_init(q);
> + if (ret)
> + goto err_destroy_all;
> +
> ret = blk_throtl_init(q);
> if (ret)
> goto err_destroy_all;
> diff --git a/block/blk-ioprio.c b/block/blk-ioprio.c
> new file mode 100644
> index 000000000000..d7292cdef67d
> --- /dev/null
> +++ b/block/blk-ioprio.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Block rq-qos policy for assigning an I/O priority to requests.
> + *
> + * Using an rq-qos policy for assigning I/O priority has two advantages over
> + * using the ioprio_set() system call:
> + *
> + * - This policy is cgroup based so it has all the advantages of cgroups.
> + * - While ioprio_set() does not affect page cache writeback I/O, this rq-qos
> + * controller affects page cache writeback I/O for filesystems that support
> + * assiociating a cgroup with writeback I/O. See also
> + * Documentation/admin-guide/cgroup-v2.rst.
> + */
> +
> +#include <linux/blk-cgroup.h>
> +#include <linux/blk-mq.h>
> +#include <linux/blk_types.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include "blk-ioprio.h"
> +#include "blk-rq-qos.h"
> +
> +/*
> + * The accepted I/O priority values are 0..IOPRIO_CLASS_MAX(3). 0 (default)
> + * means do not modify the I/O priority. Values 1..3 are used to restrict the
> + * I/O priority for a cgroup to the specified priority. See also
> + * <linux/ioprio.h>.
> + */
> +#define IOPRIO_CLASS_MAX IOPRIO_CLASS_IDLE
> +
> +static struct blkcg_policy ioprio_policy;
> +
> +/**
> + * struct ioprio_blkg - Per (cgroup, request queue) data.
> + * @pd: blkg_policy_data structure.
> + */
> +struct ioprio_blkg {
> + struct blkg_policy_data pd;
> +};
> +
> +/**
> + * struct ioprio_blkcg - Per cgroup data.
> + * @cpd: blkcg_policy_data structure.
> + * @prio_class: One of the IOPRIO_CLASS_* values. See also <linux/ioprio.h>.
> + */
> +struct ioprio_blkcg {
> + struct blkcg_policy_data cpd;
> + u8 prio_class;
> +};
> +
> +static inline struct ioprio_blkg *pd_to_ioprio(struct blkg_policy_data *pd)
> +{
> + return pd ? container_of(pd, struct ioprio_blkg, pd) : NULL;
> +}
> +
> +static struct ioprio_blkcg *
> +ioprio_blkcg_from_css(struct cgroup_subsys_state *css)
> +{
> + struct blkcg *blkcg = css_to_blkcg(css);
> + struct blkcg_policy_data *cpd = blkcg_to_cpd(blkcg, &ioprio_policy);
> +
> + return container_of(cpd, struct ioprio_blkcg, cpd);
> +}
> +
> +static struct ioprio_blkcg *ioprio_blkcg_from_bio(struct bio *bio)
> +{
> + struct blkg_policy_data *pd;
> +
> + pd = blkg_to_pd(bio->bi_blkg, &ioprio_policy);
> + if (!pd)
> + return NULL;
> +
> + return container_of(blkcg_to_cpd(pd->blkg->blkcg, &ioprio_policy),
> + struct ioprio_blkcg, cpd);
> +}
> +
> +static u64 ioprio_show_prio_class(struct cgroup_subsys_state *css,
> + struct cftype *cft)
> +{
> + struct ioprio_blkcg *blkcg = ioprio_blkcg_from_css(css);
> +
> + return blkcg->prio_class;
> +}
> +
> +static int ioprio_set_prio_class(struct cgroup_subsys_state *css,
> + struct cftype *cft, u64 val)
> +{
> + struct ioprio_blkcg *blkcg = ioprio_blkcg_from_css(css);
> +
> + if (val > IOPRIO_CLASS_MAX)
> + return -EINVAL;
Where is IOPRIO_CLASS_MAX defined ? I do not see it.
Why not use ioprio_valid() ?
> +
> + blkcg->prio_class = val;
> + return 0;
> +}
> +
> +static struct blkg_policy_data *ioprio_alloc_pd(gfp_t gfp,
> + struct request_queue *q,
> + struct blkcg *blkcg)
> +{
> + struct ioprio_blkg *ioprio_blkg;
> +
> + ioprio_blkg = kzalloc(sizeof(*ioprio_blkg), gfp);
> + if (!ioprio_blkg)
> + return NULL;
> +
> + return &ioprio_blkg->pd;
> +}
> +
> +static void ioprio_free_pd(struct blkg_policy_data *pd)
> +{
> + struct ioprio_blkg *ioprio_blkg = pd_to_ioprio(pd);
> +
> + kfree(ioprio_blkg);
> +}
> +
> +static struct blkcg_policy_data *ioprio_alloc_cpd(gfp_t gfp)
> +{
> + struct ioprio_blkcg *blkcg;
> +
> + blkcg = kzalloc(sizeof(*blkcg), gfp);
> + if (!blkcg)
> + return NULL;
> + blkcg->prio_class = IOPRIO_CLASS_NONE;
> + return &blkcg->cpd;
> +}
> +
> +static void ioprio_free_cpd(struct blkcg_policy_data *cpd)
> +{
> + struct ioprio_blkcg *blkcg = container_of(cpd, typeof(*blkcg), cpd);
> +
> + kfree(blkcg);
> +}
> +
> +#define IOPRIO_ATTRS \
> + { \
> + .name = "prio.class", \
> + .read_u64 = ioprio_show_prio_class, \
> + .write_u64 = ioprio_set_prio_class, \
> + }, \
> + { } /* sentinel */
> +
> +/* cgroup v2 attributes */
> +static struct cftype ioprio_files[] = {
> + IOPRIO_ATTRS
> +};
> +
> +/* cgroup v1 attributes */
> +static struct cftype ioprio_legacy_files[] = {
> + IOPRIO_ATTRS
> +};
> +
> +static struct blkcg_policy ioprio_policy = {
> + .dfl_cftypes = ioprio_files,
> + .legacy_cftypes = ioprio_legacy_files,
> +
> + .cpd_alloc_fn = ioprio_alloc_cpd,
> + .cpd_free_fn = ioprio_free_cpd,
> +
> + .pd_alloc_fn = ioprio_alloc_pd,
> + .pd_free_fn = ioprio_free_pd,
> +};
> +
> +struct blk_ioprio {
> + struct rq_qos rqos;
> +};
> +
> +static void blkcg_ioprio_track(struct rq_qos *rqos, struct request *rq,
> + struct bio *bio)
> +{
> + struct ioprio_blkcg *blkcg = ioprio_blkcg_from_bio(bio);
> +
> + /*
> + * Except for IOPRIO_CLASS_NONE, higher I/O priority numbers
> + * correspond to a lower priority. Hence, the max_t() below selects
> + * the lower priority of bi_ioprio and the cgroup I/O priority class.
> + * If the cgroup priority has been set to IOPRIO_CLASS_NONE == 0, the
> + * bio I/O priority is not modified. If the bio I/O priority equals
> + * IOPRIO_CLASS_NONE, the cgroup I/O priority is assigned to the bio.
> + */
> + bio->bi_ioprio = max_t(u16, bio->bi_ioprio,
> + IOPRIO_PRIO_VALUE(blkcg->prio_class, 0));
> +}
> +
> +static void blkcg_ioprio_exit(struct rq_qos *rqos)
> +{
> + struct blk_ioprio *blkioprio_blkg =
> + container_of(rqos, typeof(*blkioprio_blkg), rqos);
> +
> + blkcg_deactivate_policy(rqos->q, &ioprio_policy);
> + kfree(blkioprio_blkg);
> +}
> +
> +static struct rq_qos_ops blkcg_ioprio_ops = {
> + .track = blkcg_ioprio_track,
> + .exit = blkcg_ioprio_exit,
> +};
> +
> +int blk_ioprio_init(struct request_queue *q)
> +{
> + struct blk_ioprio *blkioprio_blkg;
> + struct rq_qos *rqos;
> + int ret;
> +
> + blkioprio_blkg = kzalloc(sizeof(*blkioprio_blkg), GFP_KERNEL);
> + if (!blkioprio_blkg)
> + return -ENOMEM;
> +
> + ret = blkcg_activate_policy(q, &ioprio_policy);
> + if (ret) {
> + kfree(blkioprio_blkg);
> + return ret;
> + }
> +
> + rqos = &blkioprio_blkg->rqos;
> + rqos->id = RQ_QOS_IOPRIO;
> + rqos->ops = &blkcg_ioprio_ops;
> + rqos->q = q;
> +
> + rq_qos_add(q, rqos);
> +
> + return 0;
> +}
> +
> +static int __init ioprio_init(void)
> +{
> + return blkcg_policy_register(&ioprio_policy);
> +}
> +
> +static void __exit ioprio_exit(void)
> +{
> + blkcg_policy_unregister(&ioprio_policy);
> +}
> +
> +module_init(ioprio_init);
> +module_exit(ioprio_exit);
> diff --git a/block/blk-ioprio.h b/block/blk-ioprio.h
> new file mode 100644
> index 000000000000..a7785c2f1aea
> --- /dev/null
> +++ b/block/blk-ioprio.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _BLK_IOPRIO_H_
> +#define _BLK_IOPRIO_H_
> +
> +#include <linux/kconfig.h>
> +
> +struct request_queue;
> +
> +#ifdef CONFIG_BLK_CGROUP_IOPRIO
> +int blk_ioprio_init(struct request_queue *q);
> +#else
> +static inline int blk_ioprio_init(struct request_queue *q)
> +{
> + return 0;
> +}
> +#endif
> +
> +#endif /* _BLK_IOPRIO_H_ */
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index 61dccf584c68..3a5400b11af7 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -11,6 +11,8 @@ const char *rq_qos_id_to_name(enum rq_qos_id id)
> return "latency";
> case RQ_QOS_COST:
> return "cost";
> + case RQ_QOS_IOPRIO:
> + return "ioprio";
> }
> return "unknown";
> }
> diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
> index 6b5f9ae69883..1d4c6e951aa6 100644
> --- a/block/blk-rq-qos.h
> +++ b/block/blk-rq-qos.h
> @@ -16,6 +16,7 @@ enum rq_qos_id {
> RQ_QOS_WBT,
> RQ_QOS_LATENCY,
> RQ_QOS_COST,
> + RQ_QOS_IOPRIO,
> };
>
> struct rq_wait {
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2021-06-09 4:40 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-08 23:06 [PATCH 00/14] Improve I/O priority support Bart Van Assche
2021-06-08 23:06 ` [PATCH 01/14] block/Kconfig: Make the BLK_WBT and BLK_WBT_MQ entries consecutive Bart Van Assche
2021-06-09 4:14 ` Damien Le Moal
2021-06-09 7:03 ` Johannes Thumshirn
2021-06-10 6:02 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 02/14] block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls Bart Van Assche
2021-06-09 4:19 ` Damien Le Moal
2021-06-09 7:05 ` Johannes Thumshirn
2021-06-10 6:02 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 03/14] block/blk-rq-qos: Move a function from a header file into a C file Bart Van Assche
2021-06-09 4:22 ` Damien Le Moal
2021-06-10 6:03 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 04/14] block: Introduce the ioprio rq-qos policy Bart Van Assche
2021-06-09 4:40 ` Damien Le Moal [this message]
2021-06-09 16:48 ` Bart Van Assche
2021-06-10 0:29 ` Damien Le Moal
2021-06-10 6:20 ` Hannes Reinecke
2021-06-10 17:14 ` Bart Van Assche
2021-06-08 23:06 ` [PATCH 05/14] block/mq-deadline: Add several comments Bart Van Assche
2021-06-10 6:21 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 06/14] block/mq-deadline: Add two lockdep_assert_held() statements Bart Van Assche
2021-06-10 6:21 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 07/14] block/mq-deadline: Remove two local variables Bart Van Assche
2021-06-10 6:22 ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 08/14] block/mq-deadline: Rename dd_init_queue() and dd_exit_queue() Bart Van Assche
2021-06-08 23:06 ` [PATCH 09/14] block/mq-deadline: Improve compile-time argument checking Bart Van Assche
2021-06-08 23:06 ` [PATCH 10/14] block/mq-deadline: Improve the sysfs show and store macros Bart Van Assche
2021-06-09 4:46 ` Damien Le Moal
2021-06-09 16:54 ` Bart Van Assche
2021-06-10 0:31 ` Damien Le Moal
2021-06-09 7:11 ` Johannes Thumshirn
2021-06-09 16:59 ` Bart Van Assche
2021-06-10 6:23 ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 11/14] block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests Bart Van Assche
2021-06-10 6:26 ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 12/14] block/mq-deadline: Add I/O priority support Bart Van Assche
2021-06-09 5:03 ` Damien Le Moal
2021-06-09 17:25 ` Bart Van Assche
2021-06-10 0:39 ` Damien Le Moal
2021-06-10 6:35 ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 13/14] block/mq-deadline: Add cgroup support Bart Van Assche
2021-06-10 6:37 ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 14/14] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
2021-06-09 5:10 ` Damien Le Moal
2021-06-10 6:43 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM6PR04MB708135D0C71D2042E6674E23E7369@DM6PR04MB7081.namprd04.prod.outlook.com \
--to=damien.lemoal@wdc.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=himanshu.madhani@oracle.com \
--cc=jaegeuk@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).