All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>, Jaegeuk Kim <jaegeuk@kernel.org>,
	Hannes Reinecke <hare@suse.de>, Ming Lei <ming.lei@redhat.com>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH 04/14] block: Introduce the ioprio rq-qos policy
Date: Wed, 9 Jun 2021 04:40:51 +0000	[thread overview]
Message-ID: <DM6PR04MB708135D0C71D2042E6674E23E7369@DM6PR04MB7081.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20210608230703.19510-5-bvanassche@acm.org

On 2021/06/09 8:07, Bart Van Assche wrote:
> Introduce an rq-qos policy that assigns an I/O priority to requests based
> on blk-cgroup configuration settings. This policy has the following
> advantages over the ioprio_set() system call:
> - This policy is cgroup based so it has all the advantages of cgroups.
> - While ioprio_set() does not affect page cache writeback I/O, this rq-qos
>   controller affects page cache writeback I/O for filesystems that support
>   assiociating a cgroup with writeback I/O. See also
>   Documentation/admin-guide/cgroup-v2.rst.
> 
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  26 +++
>  block/Kconfig                           |   9 +
>  block/Makefile                          |   1 +
>  block/blk-cgroup.c                      |   5 +
>  block/blk-ioprio.c                      | 236 ++++++++++++++++++++++++
>  block/blk-ioprio.h                      |  19 ++
>  block/blk-rq-qos.c                      |   2 +
>  block/blk-rq-qos.h                      |   1 +
>  8 files changed, 299 insertions(+)
>  create mode 100644 block/blk-ioprio.c
>  create mode 100644 block/blk-ioprio.h
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index b1e81aa8598a..cf8c8073b474 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -56,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
>         5-3-3. IO Latency
>           5-3-3-1. How IO Latency Throttling Works
>           5-3-3-2. IO Latency Interface Files
> +       5-3-4. IO Priority
>       5-4. PID
>         5-4-1. PID Interface Files
>       5-5. Cpuset
> @@ -1866,6 +1867,31 @@ IO Latency Interface Files
>  		duration of time between evaluation events.  Windows only elapse
>  		with IO activity.  Idle periods extend the most recent window.
>  
> +IO Priority
> +~~~~~~~~~~~
> +
> +A single attribute controls the behavior of the I/O priority cgroup policy,
> +namely the blkio.prio.class attribute. The following values are accepted for
> +that attribute:
> +
> +  0
> +	Do not modify the I/O priority class.
> +
> +  1
> +	For requests that do not have an I/O priority class (IOPRIO_CLASS_NONE),
> +	change the I/O priority class into 1 (IOPRIO_CLASS_RT). Do not modify
> +	the I/O priority class of other requests.
> +
> +  2
> +	For requests that do not have an I/O priority class, change the I/O
> +	priority class into IOPRIO_CLASS_BE. For requests that have the
> +	priority class RT, change it into BE (2). Do not modify
> +	the I/O priority class of requests that have priority 3 (IDLE).
> +
> +  3
> +	Change the I/O priority class of all requests into 3 (IDLE),
> +	the lowest I/O priority class.
> +
>  PID
>  ---
>  
> diff --git a/block/Kconfig b/block/Kconfig
> index 6685578b2a20..319816058298 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -162,6 +162,15 @@ config BLK_CGROUP_IOCOST
>  	distributes IO capacity between different groups based on
>  	their share of the overall weight distribution.
>  
> +config BLK_CGROUP_IOPRIO
> +	bool "Cgroup I/O controller for assigning I/O priority"
> +	depends on BLK_CGROUP
> +	help
> +	Enable the .prio interface for assigning an I/O priority to requests.
> +	The I/O priority affects the order in which an I/O scheduler and block
> +	devices process requests. Only some I/O schedulers and some block
> +	devices support I/O priorities.
> +
>  config BLK_DEBUG_FS
>  	bool "Block layer debugging information in debugfs"
>  	default y
> diff --git a/block/Makefile b/block/Makefile
> index 8d841f5f986f..af3d044abaf1 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -17,6 +17,7 @@ obj-$(CONFIG_BLK_DEV_BSGLIB)	+= bsg-lib.o
>  obj-$(CONFIG_BLK_CGROUP)	+= blk-cgroup.o
>  obj-$(CONFIG_BLK_CGROUP_RWSTAT)	+= blk-cgroup-rwstat.o
>  obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
> +obj-$(CONFIG_BLK_CGROUP_IOPRIO)	+= blk-ioprio.o
>  obj-$(CONFIG_BLK_CGROUP_IOLATENCY)	+= blk-iolatency.o
>  obj-$(CONFIG_BLK_CGROUP_IOCOST)	+= blk-iocost.o
>  obj-$(CONFIG_MQ_IOSCHED_DEADLINE)	+= mq-deadline.o
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 3b0f6efaa2b6..7b06a5fa3cac 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -31,6 +31,7 @@
>  #include <linux/tracehook.h>
>  #include <linux/psi.h>
>  #include "blk.h"
> +#include "blk-ioprio.h"
>  
>  /*
>   * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
> @@ -1187,6 +1188,10 @@ int blkcg_init_queue(struct request_queue *q)
>  	if (ret)
>  		goto err_destroy_all;
>  
> +	ret = blk_ioprio_init(q);
> +	if (ret)
> +		goto err_destroy_all;
> +
>  	ret = blk_throtl_init(q);
>  	if (ret)
>  		goto err_destroy_all;
> diff --git a/block/blk-ioprio.c b/block/blk-ioprio.c
> new file mode 100644
> index 000000000000..d7292cdef67d
> --- /dev/null
> +++ b/block/blk-ioprio.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Block rq-qos policy for assigning an I/O priority to requests.
> + *
> + * Using an rq-qos policy for assigning I/O priority has two advantages over
> + * using the ioprio_set() system call:
> + *
> + * - This policy is cgroup based so it has all the advantages of cgroups.
> + * - While ioprio_set() does not affect page cache writeback I/O, this rq-qos
> + *   controller affects page cache writeback I/O for filesystems that support
> + *   assiociating a cgroup with writeback I/O. See also
> + *   Documentation/admin-guide/cgroup-v2.rst.
> + */
> +
> +#include <linux/blk-cgroup.h>
> +#include <linux/blk-mq.h>
> +#include <linux/blk_types.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include "blk-ioprio.h"
> +#include "blk-rq-qos.h"
> +
> +/*
> + * The accepted I/O priority values are 0..IOPRIO_CLASS_MAX(3). 0 (default)
> + * means do not modify the I/O priority. Values 1..3 are used to restrict the
> + * I/O priority for a cgroup to the specified priority. See also
> + * <linux/ioprio.h>.
> + */
> +#define IOPRIO_CLASS_MAX IOPRIO_CLASS_IDLE
> +
> +static struct blkcg_policy ioprio_policy;
> +
> +/**
> + * struct ioprio_blkg - Per (cgroup, request queue) data.
> + * @pd: blkg_policy_data structure.
> + */
> +struct ioprio_blkg {
> +	struct blkg_policy_data pd;
> +};
> +
> +/**
> + * struct ioprio_blkcg - Per cgroup data.
> + * @cpd: blkcg_policy_data structure.
> + * @prio_class: One of the IOPRIO_CLASS_* values. See also <linux/ioprio.h>.
> + */
> +struct ioprio_blkcg {
> +	struct blkcg_policy_data cpd;
> +	u8			 prio_class;
> +};
> +
> +static inline struct ioprio_blkg *pd_to_ioprio(struct blkg_policy_data *pd)
> +{
> +	return pd ? container_of(pd, struct ioprio_blkg, pd) : NULL;
> +}
> +
> +static struct ioprio_blkcg *
> +ioprio_blkcg_from_css(struct cgroup_subsys_state *css)
> +{
> +	struct blkcg *blkcg = css_to_blkcg(css);
> +	struct blkcg_policy_data *cpd = blkcg_to_cpd(blkcg, &ioprio_policy);
> +
> +	return container_of(cpd, struct ioprio_blkcg, cpd);
> +}
> +
> +static struct ioprio_blkcg *ioprio_blkcg_from_bio(struct bio *bio)
> +{
> +	struct blkg_policy_data *pd;
> +
> +	pd = blkg_to_pd(bio->bi_blkg, &ioprio_policy);
> +	if (!pd)
> +		return NULL;
> +
> +	return container_of(blkcg_to_cpd(pd->blkg->blkcg, &ioprio_policy),
> +			    struct ioprio_blkcg, cpd);
> +}
> +
> +static u64 ioprio_show_prio_class(struct cgroup_subsys_state *css,
> +				  struct cftype *cft)
> +{
> +	struct ioprio_blkcg *blkcg = ioprio_blkcg_from_css(css);
> +
> +	return blkcg->prio_class;
> +}
> +
> +static int ioprio_set_prio_class(struct cgroup_subsys_state *css,
> +				 struct cftype *cft, u64 val)
> +{
> +	struct ioprio_blkcg *blkcg = ioprio_blkcg_from_css(css);
> +
> +	if (val > IOPRIO_CLASS_MAX)
> +		return -EINVAL;

Where is IOPRIO_CLASS_MAX defined ? I do not see it.
Why not use ioprio_valid() ?

> +
> +	blkcg->prio_class = val;
> +	return 0;
> +}
> +
> +static struct blkg_policy_data *ioprio_alloc_pd(gfp_t gfp,
> +						struct request_queue *q,
> +						struct blkcg *blkcg)
> +{
> +	struct ioprio_blkg *ioprio_blkg;
> +
> +	ioprio_blkg = kzalloc(sizeof(*ioprio_blkg), gfp);
> +	if (!ioprio_blkg)
> +		return NULL;
> +
> +	return &ioprio_blkg->pd;
> +}
> +
> +static void ioprio_free_pd(struct blkg_policy_data *pd)
> +{
> +	struct ioprio_blkg *ioprio_blkg = pd_to_ioprio(pd);
> +
> +	kfree(ioprio_blkg);
> +}
> +
> +static struct blkcg_policy_data *ioprio_alloc_cpd(gfp_t gfp)
> +{
> +	struct ioprio_blkcg *blkcg;
> +
> +	blkcg = kzalloc(sizeof(*blkcg), gfp);
> +	if (!blkcg)
> +		return NULL;
> +	blkcg->prio_class = IOPRIO_CLASS_NONE;
> +	return &blkcg->cpd;
> +}
> +
> +static void ioprio_free_cpd(struct blkcg_policy_data *cpd)
> +{
> +	struct ioprio_blkcg *blkcg = container_of(cpd, typeof(*blkcg), cpd);
> +
> +	kfree(blkcg);
> +}
> +
> +#define IOPRIO_ATTRS						\
> +	{							\
> +		.name		= "prio.class",			\
> +		.read_u64	= ioprio_show_prio_class,	\
> +		.write_u64	= ioprio_set_prio_class,	\
> +	},							\
> +	{ } /* sentinel */
> +
> +/* cgroup v2 attributes */
> +static struct cftype ioprio_files[] = {
> +	IOPRIO_ATTRS
> +};
> +
> +/* cgroup v1 attributes */
> +static struct cftype ioprio_legacy_files[] = {
> +	IOPRIO_ATTRS
> +};
> +
> +static struct blkcg_policy ioprio_policy = {
> +	.dfl_cftypes	= ioprio_files,
> +	.legacy_cftypes = ioprio_legacy_files,
> +
> +	.cpd_alloc_fn	= ioprio_alloc_cpd,
> +	.cpd_free_fn	= ioprio_free_cpd,
> +
> +	.pd_alloc_fn	= ioprio_alloc_pd,
> +	.pd_free_fn	= ioprio_free_pd,
> +};
> +
> +struct blk_ioprio {
> +	struct rq_qos rqos;
> +};
> +
> +static void blkcg_ioprio_track(struct rq_qos *rqos, struct request *rq,
> +			       struct bio *bio)
> +{
> +	struct ioprio_blkcg *blkcg = ioprio_blkcg_from_bio(bio);
> +
> +	/*
> +	 * Except for IOPRIO_CLASS_NONE, higher I/O priority numbers
> +	 * correspond to a lower priority. Hence, the max_t() below selects
> +	 * the lower priority of bi_ioprio and the cgroup I/O priority class.
> +	 * If the cgroup priority has been set to IOPRIO_CLASS_NONE == 0, the
> +	 * bio I/O priority is not modified. If the bio I/O priority equals
> +	 * IOPRIO_CLASS_NONE, the cgroup I/O priority is assigned to the bio.
> +	 */
> +	bio->bi_ioprio = max_t(u16, bio->bi_ioprio,
> +			       IOPRIO_PRIO_VALUE(blkcg->prio_class, 0));
> +}
> +
> +static void blkcg_ioprio_exit(struct rq_qos *rqos)
> +{
> +	struct blk_ioprio *blkioprio_blkg =
> +		container_of(rqos, typeof(*blkioprio_blkg), rqos);
> +
> +	blkcg_deactivate_policy(rqos->q, &ioprio_policy);
> +	kfree(blkioprio_blkg);
> +}
> +
> +static struct rq_qos_ops blkcg_ioprio_ops = {
> +	.track = blkcg_ioprio_track,
> +	.exit = blkcg_ioprio_exit,
> +};
> +
> +int blk_ioprio_init(struct request_queue *q)
> +{
> +	struct blk_ioprio *blkioprio_blkg;
> +	struct rq_qos *rqos;
> +	int ret;
> +
> +	blkioprio_blkg = kzalloc(sizeof(*blkioprio_blkg), GFP_KERNEL);
> +	if (!blkioprio_blkg)
> +		return -ENOMEM;
> +
> +	ret = blkcg_activate_policy(q, &ioprio_policy);
> +	if (ret) {
> +		kfree(blkioprio_blkg);
> +		return ret;
> +	}
> +
> +	rqos = &blkioprio_blkg->rqos;
> +	rqos->id = RQ_QOS_IOPRIO;
> +	rqos->ops = &blkcg_ioprio_ops;
> +	rqos->q = q;
> +
> +	rq_qos_add(q, rqos);
> +
> +	return 0;
> +}
> +
> +static int __init ioprio_init(void)
> +{
> +	return blkcg_policy_register(&ioprio_policy);
> +}
> +
> +static void __exit ioprio_exit(void)
> +{
> +	blkcg_policy_unregister(&ioprio_policy);
> +}
> +
> +module_init(ioprio_init);
> +module_exit(ioprio_exit);
> diff --git a/block/blk-ioprio.h b/block/blk-ioprio.h
> new file mode 100644
> index 000000000000..a7785c2f1aea
> --- /dev/null
> +++ b/block/blk-ioprio.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _BLK_IOPRIO_H_
> +#define _BLK_IOPRIO_H_
> +
> +#include <linux/kconfig.h>
> +
> +struct request_queue;
> +
> +#ifdef CONFIG_BLK_CGROUP_IOPRIO
> +int blk_ioprio_init(struct request_queue *q);
> +#else
> +static inline int blk_ioprio_init(struct request_queue *q)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#endif /* _BLK_IOPRIO_H_ */
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index 61dccf584c68..3a5400b11af7 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -11,6 +11,8 @@ const char *rq_qos_id_to_name(enum rq_qos_id id)
>  		return "latency";
>  	case RQ_QOS_COST:
>  		return "cost";
> +	case RQ_QOS_IOPRIO:
> +		return "ioprio";
>  	}
>  	return "unknown";
>  }
> diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
> index 6b5f9ae69883..1d4c6e951aa6 100644
> --- a/block/blk-rq-qos.h
> +++ b/block/blk-rq-qos.h
> @@ -16,6 +16,7 @@ enum rq_qos_id {
>  	RQ_QOS_WBT,
>  	RQ_QOS_LATENCY,
>  	RQ_QOS_COST,
> +	RQ_QOS_IOPRIO,
>  };
>  
>  struct rq_wait {
> 


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2021-06-09  4:40 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08 23:06 [PATCH 00/14] Improve I/O priority support Bart Van Assche
2021-06-08 23:06 ` [PATCH 01/14] block/Kconfig: Make the BLK_WBT and BLK_WBT_MQ entries consecutive Bart Van Assche
2021-06-09  4:14   ` Damien Le Moal
2021-06-09  7:03   ` Johannes Thumshirn
2021-06-10  6:02   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 02/14] block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls Bart Van Assche
2021-06-09  4:19   ` Damien Le Moal
2021-06-09  7:05   ` Johannes Thumshirn
2021-06-10  6:02   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 03/14] block/blk-rq-qos: Move a function from a header file into a C file Bart Van Assche
2021-06-09  4:22   ` Damien Le Moal
2021-06-10  6:03   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 04/14] block: Introduce the ioprio rq-qos policy Bart Van Assche
2021-06-09  4:40   ` Damien Le Moal [this message]
2021-06-09 16:48     ` Bart Van Assche
2021-06-10  0:29       ` Damien Le Moal
2021-06-10  6:20   ` Hannes Reinecke
2021-06-10 17:14     ` Bart Van Assche
2021-06-08 23:06 ` [PATCH 05/14] block/mq-deadline: Add several comments Bart Van Assche
2021-06-10  6:21   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 06/14] block/mq-deadline: Add two lockdep_assert_held() statements Bart Van Assche
2021-06-10  6:21   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 07/14] block/mq-deadline: Remove two local variables Bart Van Assche
2021-06-10  6:22   ` Hannes Reinecke
2021-06-08 23:06 ` [PATCH 08/14] block/mq-deadline: Rename dd_init_queue() and dd_exit_queue() Bart Van Assche
2021-06-08 23:06 ` [PATCH 09/14] block/mq-deadline: Improve compile-time argument checking Bart Van Assche
2021-06-08 23:06 ` [PATCH 10/14] block/mq-deadline: Improve the sysfs show and store macros Bart Van Assche
2021-06-09  4:46   ` Damien Le Moal
2021-06-09 16:54     ` Bart Van Assche
2021-06-10  0:31       ` Damien Le Moal
2021-06-09  7:11   ` Johannes Thumshirn
2021-06-09 16:59     ` Bart Van Assche
2021-06-10  6:23   ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 11/14] block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests Bart Van Assche
2021-06-10  6:26   ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 12/14] block/mq-deadline: Add I/O priority support Bart Van Assche
2021-06-09  5:03   ` Damien Le Moal
2021-06-09 17:25     ` Bart Van Assche
2021-06-10  0:39       ` Damien Le Moal
2021-06-10  6:35   ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 13/14] block/mq-deadline: Add cgroup support Bart Van Assche
2021-06-10  6:37   ` Hannes Reinecke
2021-06-08 23:07 ` [PATCH 14/14] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
2021-06-09  5:10   ` Damien Le Moal
2021-06-10  6:43   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR04MB708135D0C71D2042E6674E23E7369@DM6PR04MB7081.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=himanshu.madhani@oracle.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.