* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2021-09-27 22:03 ` [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
@ 2021-09-27 22:53 ` Damien Le Moal
2021-09-27 23:12 ` Bart Van Assche
2021-09-28 5:53 ` Hannes Reinecke
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2021-09-27 22:53 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Damien Le Moal,
Niklas Cassel, Hannes Reinecke
On 2021/09/28 7:03, Bart Van Assche wrote:
> In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
> Prioritize high-priority requests""), this patch uses 'jiffies' instead
> of ktime_get() in the code for aging lower priority requests.
>
> This patch has been tested as follows:
>
> Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: 555 K IOPS.
>
> Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: about 380 K IOPS.
>
> Ran the following script:
>
> set -e
> scriptdir=$(dirname "$0")
> if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
> modprobe scsi_debug ndelay=1000000 max_queue=16
> sd=''
> while [ -z "$sd" ]; do
> sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
> done
> echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
> if [ -e /sys/fs/cgroup/io.prio.class ]; then
> cd /sys/fs/cgroup
> echo restrict-to-be >io.prio.class
> echo +io > cgroup.subtree_control
> else
> cd /sys/fs/cgroup/blkio/
> echo restrict-to-be >blkio.prio.class
> fi
> echo $$ >cgroup.procs
> mkdir -p hipri
> cd hipri
> if [ -e io.prio.class ]; then
> echo none-to-rt >io.prio.class
> else
> echo none-to-rt >blkio.prio.class
> fi
> { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
> echo $$ >cgroup.procs
> "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
>
> Result:
> * 11000 IOPS for the high-priority job
> * 40 IOPS for the low-priority job
>
> If the prio aging expiry time is changed from 100s into 0, the IOPS results
> change into 6712 and 6796 IOPS.
>
> The max-iops script is a script that runs fio with the following arguments:
> --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
> --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
> --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
> --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
> --filename=${positional_argument_1}
>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/mq-deadline.c | 77 ++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 73 insertions(+), 4 deletions(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index b262f40f32c0..bb723478baf1 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -31,6 +31,11 @@
> */
> static const int read_expire = HZ / 2; /* max time before a read is submitted. */
> static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
> +/*
> + * Time after which to dispatch lower priority requests even if higher
> + * priority requests are pending.
> + */
> +static const int prio_aging_expire = 10 * HZ;
> static const int writes_starved = 2; /* max times reads can starve a write */
> static const int fifo_batch = 16; /* # of sequential requests treated as one
> by the above parameters. For throughput. */
> @@ -96,6 +101,7 @@ struct deadline_data {
> int writes_starved;
> int front_merges;
> u32 async_depth;
> + int prio_aging_expire;
>
> spinlock_t lock;
> spinlock_t zone_lock;
> @@ -338,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
> return rq;
> }
>
> +/*
> + * Returns true if and only if @rq started after @latest_start where
> + * @latest_start is in jiffies.
> + */
> +static bool started_after(struct deadline_data *dd, struct request *rq,
> + unsigned long latest_start)
> +{
> + unsigned long start_time = (unsigned long)rq->fifo_time;
> +
> + start_time -= dd->fifo_expire[rq_data_dir(rq)];
> +
> + return time_after(start_time, latest_start);
> +}
> +
> /*
> * deadline_dispatch_requests selects the best request according to
> - * read/write expire, fifo_batch, etc
> + * read/write expire, fifo_batch, etc and with a start time <= @latest.
s/@latest/@latest_start ?
> */
> static struct request *__dd_dispatch_request(struct deadline_data *dd,
> - struct dd_per_prio *per_prio)
> + struct dd_per_prio *per_prio,
> + unsigned long latest_start)
> {
> struct request *rq, *next_rq;
> enum dd_data_dir data_dir;
> @@ -355,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> if (!list_empty(&per_prio->dispatch)) {
> rq = list_first_entry(&per_prio->dispatch, struct request,
> queuelist);
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> list_del_init(&rq->queuelist);
> goto done;
> }
> @@ -432,6 +455,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> dd->batching = 0;
>
> dispatch_request:
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> +
> /*
> * rq is the selected appropriate request.
> */
> @@ -449,6 +475,34 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> return rq;
> }
>
> +/*
> + * Check whether there are any requests with priority other than DD_RT_PRIO
> + * that were inserted more than prio_aging_expire jiffies ago.
> + */
> +static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
> + unsigned long now)
> +{
> + struct request *rq;
> + enum dd_prio prio;
> + int prio_cnt;
> +
> + lockdep_assert_held(&dd->lock);
> +
> + prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
> + !!dd_queued(dd, DD_IDLE_PRIO);
> + if (prio_cnt < 2)
> + return NULL;
> +
> + for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
> + now - dd->prio_aging_expire);
> + if (rq)
> + return rq;
> + }
> +
> + return NULL;
> +}
> +
> /*
> * Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
> *
> @@ -460,15 +514,26 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
> {
> struct deadline_data *dd = hctx->queue->elevator->elevator_data;
> + const unsigned long now = jiffies;
> struct request *rq;
> enum dd_prio prio;
>
> spin_lock(&dd->lock);
> + rq = dd_dispatch_prio_aged_requests(dd, now);
> + if (rq)
> + goto unlock;
> +
> + /*
> + * Next, dispatch requests in priority order. Ignore lower priority
> + * requests if any higher priority requests are pending.
> + */
> for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
> - rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
> - if (rq)
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
> + if (rq || dd_queued(dd, prio))
> break;
> }
> +
> +unlock:
> spin_unlock(&dd->lock);
>
> return rq;
> @@ -573,6 +638,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
> dd->front_merges = 1;
> dd->last_dir = DD_WRITE;
> dd->fifo_batch = fifo_batch;
> + dd->prio_aging_expire = prio_aging_expire;
> spin_lock_init(&dd->lock);
> spin_lock_init(&dd->zone_lock);
>
> @@ -796,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
> #define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
> SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
> SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
> +SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
> SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
> SHOW_INT(deadline_front_merges_show, dd->front_merges);
> SHOW_INT(deadline_async_depth_show, dd->front_merges);
> @@ -825,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
> STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
> STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
> STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
> +STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
> STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
> STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
> STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
> @@ -843,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
> DD_ATTR(front_merges),
> DD_ATTR(async_depth),
> DD_ATTR(fifo_batch),
> + DD_ATTR(prio_aging_expire),
> __ATTR_NULL
> };
>
>
Apart from the nit above, looks good to me.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2021-09-27 22:53 ` Damien Le Moal
@ 2021-09-27 23:12 ` Bart Van Assche
0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2021-09-27 23:12 UTC (permalink / raw)
To: Damien Le Moal, Jens Axboe
Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Damien Le Moal,
Niklas Cassel, Hannes Reinecke
On 9/27/21 3:53 PM, Damien Le Moal wrote:
> On 2021/09/28 7:03, Bart Van Assche wrote:
>> /*
>> * deadline_dispatch_requests selects the best request according to
>> - * read/write expire, fifo_batch, etc
>> + * read/write expire, fifo_batch, etc and with a start time <= @latest.
>
> s/@latest/@latest_start ?
>
>> */
>> static struct request *__dd_dispatch_request(struct deadline_data *dd,
>> - struct dd_per_prio *per_prio)
>> + struct dd_per_prio *per_prio,
>> + unsigned long latest_start)
Right, the comment above this function needs to be updated.
Bart.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2021-09-27 22:03 ` [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
2021-09-27 22:53 ` Damien Le Moal
@ 2021-09-28 5:53 ` Hannes Reinecke
2021-09-28 10:36 ` Niklas Cassel
2022-03-25 12:33 ` Oleksij Rempel
3 siblings, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2021-09-28 5:53 UTC (permalink / raw)
To: Bart Van Assche, Jens Axboe
Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Damien Le Moal,
Niklas Cassel
On 9/28/21 12:03 AM, Bart Van Assche wrote:
> In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
> Prioritize high-priority requests""), this patch uses 'jiffies' instead
> of ktime_get() in the code for aging lower priority requests.
>
> This patch has been tested as follows:
>
> Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: 555 K IOPS.
>
> Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: about 380 K IOPS.
>
> Ran the following script:
>
> set -e
> scriptdir=$(dirname "$0")
> if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
> modprobe scsi_debug ndelay=1000000 max_queue=16
> sd=''
> while [ -z "$sd" ]; do
> sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
> done
> echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
> if [ -e /sys/fs/cgroup/io.prio.class ]; then
> cd /sys/fs/cgroup
> echo restrict-to-be >io.prio.class
> echo +io > cgroup.subtree_control
> else
> cd /sys/fs/cgroup/blkio/
> echo restrict-to-be >blkio.prio.class
> fi
> echo $$ >cgroup.procs
> mkdir -p hipri
> cd hipri
> if [ -e io.prio.class ]; then
> echo none-to-rt >io.prio.class
> else
> echo none-to-rt >blkio.prio.class
> fi
> { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
> echo $$ >cgroup.procs
> "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
>
> Result:
> * 11000 IOPS for the high-priority job
> * 40 IOPS for the low-priority job
>
> If the prio aging expiry time is changed from 100s into 0, the IOPS results
> change into 6712 and 6796 IOPS.
>
> The max-iops script is a script that runs fio with the following arguments:
> --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
> --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
> --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
> --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
> --filename=${positional_argument_1}
>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/mq-deadline.c | 77 ++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 73 insertions(+), 4 deletions(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index b262f40f32c0..bb723478baf1 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -31,6 +31,11 @@
> */
> static const int read_expire = HZ / 2; /* max time before a read is submitted. */
> static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
> +/*
> + * Time after which to dispatch lower priority requests even if higher
> + * priority requests are pending.
> + */
> +static const int prio_aging_expire = 10 * HZ;
> static const int writes_starved = 2; /* max times reads can starve a write */
> static const int fifo_batch = 16; /* # of sequential requests treated as one
> by the above parameters. For throughput. */
> @@ -96,6 +101,7 @@ struct deadline_data {
> int writes_starved;
> int front_merges;
> u32 async_depth;
> + int prio_aging_expire;
>
> spinlock_t lock;
> spinlock_t zone_lock;
> @@ -338,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
> return rq;
> }
>
> +/*
> + * Returns true if and only if @rq started after @latest_start where
> + * @latest_start is in jiffies.
> + */
> +static bool started_after(struct deadline_data *dd, struct request *rq,
> + unsigned long latest_start)
> +{
> + unsigned long start_time = (unsigned long)rq->fifo_time;
> +
> + start_time -= dd->fifo_expire[rq_data_dir(rq)];
> +
> + return time_after(start_time, latest_start);
> +}
> +
> /*
> * deadline_dispatch_requests selects the best request according to
> - * read/write expire, fifo_batch, etc
> + * read/write expire, fifo_batch, etc and with a start time <= @latest.
> */
> static struct request *__dd_dispatch_request(struct deadline_data *dd,
> - struct dd_per_prio *per_prio)
> + struct dd_per_prio *per_prio,
> + unsigned long latest_start)
> {
> struct request *rq, *next_rq;
> enum dd_data_dir data_dir;
> @@ -355,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> if (!list_empty(&per_prio->dispatch)) {
> rq = list_first_entry(&per_prio->dispatch, struct request,
> queuelist);
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> list_del_init(&rq->queuelist);
> goto done;
> }
> @@ -432,6 +455,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> dd->batching = 0;
>
> dispatch_request:
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> +
> /*
> * rq is the selected appropriate request.
> */
> @@ -449,6 +475,34 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> return rq;
> }
>
> +/*
> + * Check whether there are any requests with priority other than DD_RT_PRIO
> + * that were inserted more than prio_aging_expire jiffies ago.
> + */
> +static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
> + unsigned long now)
> +{
> + struct request *rq;
> + enum dd_prio prio;
> + int prio_cnt;
> +
> + lockdep_assert_held(&dd->lock);
> +
> + prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
> + !!dd_queued(dd, DD_IDLE_PRIO);
> + if (prio_cnt < 2)
> + return NULL;
> +
> + for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
> + now - dd->prio_aging_expire);
> + if (rq)
> + return rq;
> + }
> +
> + return NULL;
> +}
> +
> /*
> * Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
> *
> @@ -460,15 +514,26 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
> {
> struct deadline_data *dd = hctx->queue->elevator->elevator_data;
> + const unsigned long now = jiffies;
> struct request *rq;
> enum dd_prio prio;
>
> spin_lock(&dd->lock);
> + rq = dd_dispatch_prio_aged_requests(dd, now);
> + if (rq)
> + goto unlock;
> +
> + /*
> + * Next, dispatch requests in priority order. Ignore lower priority
> + * requests if any higher priority requests are pending.
> + */
> for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
> - rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
> - if (rq)
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
> + if (rq || dd_queued(dd, prio))
> break;
> }
> +
> +unlock:
> spin_unlock(&dd->lock);
>
> return rq;
> @@ -573,6 +638,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
> dd->front_merges = 1;
> dd->last_dir = DD_WRITE;
> dd->fifo_batch = fifo_batch;
> + dd->prio_aging_expire = prio_aging_expire;
> spin_lock_init(&dd->lock);
> spin_lock_init(&dd->zone_lock);
>
> @@ -796,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
> #define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
> SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
> SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
> +SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
> SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
> SHOW_INT(deadline_front_merges_show, dd->front_merges);
> SHOW_INT(deadline_async_depth_show, dd->front_merges);
> @@ -825,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
> STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
> STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
> STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
> +STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
> STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
> STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
> STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
> @@ -843,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
> DD_ATTR(front_merges),
> DD_ATTR(async_depth),
> DD_ATTR(fifo_batch),
> + DD_ATTR(prio_aging_expire),
> __ATTR_NULL
> };
>
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2021-09-27 22:03 ` [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
2021-09-27 22:53 ` Damien Le Moal
2021-09-28 5:53 ` Hannes Reinecke
@ 2021-09-28 10:36 ` Niklas Cassel
2022-03-25 12:33 ` Oleksij Rempel
3 siblings, 0 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-09-28 10:36 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Hannes Reinecke
On Mon, Sep 27, 2021 at 03:03:28PM -0700, Bart Van Assche wrote:
> In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
> Prioritize high-priority requests""), this patch uses 'jiffies' instead
> of ktime_get() in the code for aging lower priority requests.
Considering that this is basically a revert of a revert,
except for the ktime_get() to jiffies change, I think that
you should state the reason for this change in the change log.
> Ran the following script:
>
> set -e
> scriptdir=$(dirname "$0")
> if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
> modprobe scsi_debug ndelay=1000000 max_queue=16
> sd=''
> while [ -z "$sd" ]; do
> sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
> done
> echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
> if [ -e /sys/fs/cgroup/io.prio.class ]; then
> cd /sys/fs/cgroup
> echo restrict-to-be >io.prio.class
> echo +io > cgroup.subtree_control
> else
> cd /sys/fs/cgroup/blkio/
> echo restrict-to-be >blkio.prio.class
> fi
> echo $$ >cgroup.procs
> mkdir -p hipri
> cd hipri
> if [ -e io.prio.class ]; then
> echo none-to-rt >io.prio.class
> else
> echo none-to-rt >blkio.prio.class
> fi
> { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
> echo $$ >cgroup.procs
> "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
>
> Result:
> * 11000 IOPS for the high-priority job
> * 40 IOPS for the low-priority job
>
> If the prio aging expiry time is changed from 100s into 0, the IOPS results
> change into 6712 and 6796 IOPS.
>
> The max-iops script is a script that runs fio with the following arguments:
> --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
> --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
> --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
> --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
> --filename=${positional_argument_1}
>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
With an elaborated change log, this looks good to me:
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2021-09-27 22:03 ` [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
` (2 preceding siblings ...)
2021-09-28 10:36 ` Niklas Cassel
@ 2022-03-25 12:33 ` Oleksij Rempel
2022-03-25 13:07 ` Oleksij Rempel
2022-03-25 17:05 ` Bart Van Assche
3 siblings, 2 replies; 21+ messages in thread
From: Oleksij Rempel @ 2022-03-25 12:33 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
Hello Bart,
I have regression on iMX6 + USB storage device with this patch.
After plugging USB Flash driver (in my case USB3 Intenso 32GB) and
running mount /dev/sda1 /mnt, the command will never exit.
Reverting this patchs (322cff70d46) on top of v5.17 solves it for me.
How can I help you to debug this issue?
Regards,
Oleksij
On Mon, Sep 27, 2021 at 03:03:28PM -0700, Bart Van Assche wrote:
> In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
> Prioritize high-priority requests""), this patch uses 'jiffies' instead
> of ktime_get() in the code for aging lower priority requests.
>
> This patch has been tested as follows:
>
> Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: 555 K IOPS.
>
> Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
> Result without and with this patch: about 380 K IOPS.
>
> Ran the following script:
>
> set -e
> scriptdir=$(dirname "$0")
> if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
> modprobe scsi_debug ndelay=1000000 max_queue=16
> sd=''
> while [ -z "$sd" ]; do
> sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
> done
> echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
> if [ -e /sys/fs/cgroup/io.prio.class ]; then
> cd /sys/fs/cgroup
> echo restrict-to-be >io.prio.class
> echo +io > cgroup.subtree_control
> else
> cd /sys/fs/cgroup/blkio/
> echo restrict-to-be >blkio.prio.class
> fi
> echo $$ >cgroup.procs
> mkdir -p hipri
> cd hipri
> if [ -e io.prio.class ]; then
> echo none-to-rt >io.prio.class
> else
> echo none-to-rt >blkio.prio.class
> fi
> { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
> echo $$ >cgroup.procs
> "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
>
> Result:
> * 11000 IOPS for the high-priority job
> * 40 IOPS for the low-priority job
>
> If the prio aging expiry time is changed from 100s into 0, the IOPS results
> change into 6712 and 6796 IOPS.
>
> The max-iops script is a script that runs fio with the following arguments:
> --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
> --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
> --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
> --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
> --filename=${positional_argument_1}
>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> block/mq-deadline.c | 77 ++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 73 insertions(+), 4 deletions(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index b262f40f32c0..bb723478baf1 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -31,6 +31,11 @@
> */
> static const int read_expire = HZ / 2; /* max time before a read is submitted. */
> static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
> +/*
> + * Time after which to dispatch lower priority requests even if higher
> + * priority requests are pending.
> + */
> +static const int prio_aging_expire = 10 * HZ;
> static const int writes_starved = 2; /* max times reads can starve a write */
> static const int fifo_batch = 16; /* # of sequential requests treated as one
> by the above parameters. For throughput. */
> @@ -96,6 +101,7 @@ struct deadline_data {
> int writes_starved;
> int front_merges;
> u32 async_depth;
> + int prio_aging_expire;
>
> spinlock_t lock;
> spinlock_t zone_lock;
> @@ -338,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
> return rq;
> }
>
> +/*
> + * Returns true if and only if @rq started after @latest_start where
> + * @latest_start is in jiffies.
> + */
> +static bool started_after(struct deadline_data *dd, struct request *rq,
> + unsigned long latest_start)
> +{
> + unsigned long start_time = (unsigned long)rq->fifo_time;
> +
> + start_time -= dd->fifo_expire[rq_data_dir(rq)];
> +
> + return time_after(start_time, latest_start);
> +}
> +
> /*
> * deadline_dispatch_requests selects the best request according to
> - * read/write expire, fifo_batch, etc
> + * read/write expire, fifo_batch, etc and with a start time <= @latest.
> */
> static struct request *__dd_dispatch_request(struct deadline_data *dd,
> - struct dd_per_prio *per_prio)
> + struct dd_per_prio *per_prio,
> + unsigned long latest_start)
> {
> struct request *rq, *next_rq;
> enum dd_data_dir data_dir;
> @@ -355,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> if (!list_empty(&per_prio->dispatch)) {
> rq = list_first_entry(&per_prio->dispatch, struct request,
> queuelist);
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> list_del_init(&rq->queuelist);
> goto done;
> }
> @@ -432,6 +455,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> dd->batching = 0;
>
> dispatch_request:
> + if (started_after(dd, rq, latest_start))
> + return NULL;
> +
> /*
> * rq is the selected appropriate request.
> */
> @@ -449,6 +475,34 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> return rq;
> }
>
> +/*
> + * Check whether there are any requests with priority other than DD_RT_PRIO
> + * that were inserted more than prio_aging_expire jiffies ago.
> + */
> +static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
> + unsigned long now)
> +{
> + struct request *rq;
> + enum dd_prio prio;
> + int prio_cnt;
> +
> + lockdep_assert_held(&dd->lock);
> +
> + prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
> + !!dd_queued(dd, DD_IDLE_PRIO);
> + if (prio_cnt < 2)
> + return NULL;
> +
> + for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
> + now - dd->prio_aging_expire);
> + if (rq)
> + return rq;
> + }
> +
> + return NULL;
> +}
> +
> /*
> * Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
> *
> @@ -460,15 +514,26 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
> {
> struct deadline_data *dd = hctx->queue->elevator->elevator_data;
> + const unsigned long now = jiffies;
> struct request *rq;
> enum dd_prio prio;
>
> spin_lock(&dd->lock);
> + rq = dd_dispatch_prio_aged_requests(dd, now);
> + if (rq)
> + goto unlock;
> +
> + /*
> + * Next, dispatch requests in priority order. Ignore lower priority
> + * requests if any higher priority requests are pending.
> + */
> for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
> - rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
> - if (rq)
> + rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
> + if (rq || dd_queued(dd, prio))
> break;
> }
> +
> +unlock:
> spin_unlock(&dd->lock);
>
> return rq;
> @@ -573,6 +638,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
> dd->front_merges = 1;
> dd->last_dir = DD_WRITE;
> dd->fifo_batch = fifo_batch;
> + dd->prio_aging_expire = prio_aging_expire;
> spin_lock_init(&dd->lock);
> spin_lock_init(&dd->zone_lock);
>
> @@ -796,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
> #define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
> SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
> SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
> +SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
> SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
> SHOW_INT(deadline_front_merges_show, dd->front_merges);
> SHOW_INT(deadline_async_depth_show, dd->front_merges);
> @@ -825,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
> STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
> STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
> STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
> +STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
> STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
> STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
> STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
> @@ -843,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
> DD_ATTR(front_merges),
> DD_ATTR(async_depth),
> DD_ATTR(fifo_batch),
> + DD_ATTR(prio_aging_expire),
> __ATTR_NULL
> };
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-25 12:33 ` Oleksij Rempel
@ 2022-03-25 13:07 ` Oleksij Rempel
2022-03-25 17:05 ` Bart Van Assche
1 sibling, 0 replies; 21+ messages in thread
From: Oleksij Rempel @ 2022-03-25 13:07 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke, kernel
On Fri, Mar 25, 2022 at 01:33:20PM +0100, Oleksij Rempel wrote:
> Hello Bart,
>
> I have regression on iMX6 + USB storage device with this patch.
> After plugging USB Flash driver (in my case USB3 Intenso 32GB) and
> running mount /dev/sda1 /mnt, the command will never exit.
>
> Reverting this patchs (322cff70d46) on top of v5.17 solves it for me.
>
> How can I help you to debug this issue?
I hope this can help:
- it seems to be reproducible only with some type of USB storage devices
(USB3 on top of USB2 host?)
- mount takes long time, but at some point it will manage it
- this backtrace I get from sysrq:
kernel: Exception stack(0x83a59fb0 to 0x83a59ff8)
kernel: 9fa0: 00000000 00000000 00000000 00000000
kernel: 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
kernel: r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:80147ea0
kernel: r4:83a4cec0 r3:8118cd24
kernel: task:mount state:D stack: 0 pid: 282 ppid: 275 flags:0x00000000
kernel: Backtrace:
kernel: __schedule from schedule+0x84/0xc0
kernel: r10:82ebdd94 r9:00000102 r8:82ad3600 r7:81203ea4 r6:00000000 r5:00000002
kernel: r4:82ad3600
kernel: schedule from io_schedule+0x20/0x2c
kernel: r5:00000002 r4:00000000
kernel: io_schedule from folio_wait_bit_common+0x18c/0x220
kernel: r5:00000002 r4:9bc2e580
kernel: folio_wait_bit_common from folio_put_wait_locked+0x24/0x28
kernel: r10:00000003 r9:825124f0 r8:00000000 r7:825124e0 r6:9bc2e580 r5:82ebdf00
kernel: r4:00080001
kernel: folio_put_wait_locked from filemap_read+0x4f0/0x824
kernel: filemap_read from blkdev_read_iter+0x144/0x19c
kernel: r10:00000003 r9:82ad3600 r8:00000000 r7:82ebdee8 r6:82ebdf00 r5:00000040
kernel: r4:00000000
kernel: blkdev_read_iter from vfs_read+0x138/0x188
kernel: r8:00000040 r7:01649af0 r6:82ebdf58 r5:82ddfe40 r4:00000000
kernel: vfs_read from ksys_read+0x84/0xd8
kernel: r8:00000040 r7:82ebdf64 r6:82ebdf58 r5:01649af0 r4:82ddfe40
kernel: ksys_read from sys_read+0x18/0x1c
kernel: r8:801002a4 r7:00000003 r6:76fce840 r5:708f0000 r4:01649a00
kernel: sys_read from ret_fast_syscall+0x0/0x1c
> Regards,
> Oleksij
>
> On Mon, Sep 27, 2021 at 03:03:28PM -0700, Bart Van Assche wrote:
> > In addition to reverting commit 7b05bf771084 ("Revert "block/mq-deadline:
> > Prioritize high-priority requests""), this patch uses 'jiffies' instead
> > of ktime_get() in the code for aging lower priority requests.
> >
> > This patch has been tested as follows:
> >
> > Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
> > Result without and with this patch: 555 K IOPS.
> >
> > Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
> > Result without and with this patch: about 380 K IOPS.
> >
> > Ran the following script:
> >
> > set -e
> > scriptdir=$(dirname "$0")
> > if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
> > modprobe scsi_debug ndelay=1000000 max_queue=16
> > sd=''
> > while [ -z "$sd" ]; do
> > sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
> > done
> > echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
> > if [ -e /sys/fs/cgroup/io.prio.class ]; then
> > cd /sys/fs/cgroup
> > echo restrict-to-be >io.prio.class
> > echo +io > cgroup.subtree_control
> > else
> > cd /sys/fs/cgroup/blkio/
> > echo restrict-to-be >blkio.prio.class
> > fi
> > echo $$ >cgroup.procs
> > mkdir -p hipri
> > cd hipri
> > if [ -e io.prio.class ]; then
> > echo none-to-rt >io.prio.class
> > else
> > echo none-to-rt >blkio.prio.class
> > fi
> > { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
> > echo $$ >cgroup.procs
> > "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
> >
> > Result:
> > * 11000 IOPS for the high-priority job
> > * 40 IOPS for the low-priority job
> >
> > If the prio aging expiry time is changed from 100s into 0, the IOPS results
> > change into 6712 and 6796 IOPS.
> >
> > The max-iops script is a script that runs fio with the following arguments:
> > --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
> > --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
> > --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
> > --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
> > --filename=${positional_argument_1}
> >
> > Cc: Damien Le Moal <damien.lemoal@wdc.com>
> > Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > ---
> > block/mq-deadline.c | 77 ++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 73 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index b262f40f32c0..bb723478baf1 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -31,6 +31,11 @@
> > */
> > static const int read_expire = HZ / 2; /* max time before a read is submitted. */
> > static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
> > +/*
> > + * Time after which to dispatch lower priority requests even if higher
> > + * priority requests are pending.
> > + */
> > +static const int prio_aging_expire = 10 * HZ;
> > static const int writes_starved = 2; /* max times reads can starve a write */
> > static const int fifo_batch = 16; /* # of sequential requests treated as one
> > by the above parameters. For throughput. */
> > @@ -96,6 +101,7 @@ struct deadline_data {
> > int writes_starved;
> > int front_merges;
> > u32 async_depth;
> > + int prio_aging_expire;
> >
> > spinlock_t lock;
> > spinlock_t zone_lock;
> > @@ -338,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
> > return rq;
> > }
> >
> > +/*
> > + * Returns true if and only if @rq started after @latest_start where
> > + * @latest_start is in jiffies.
> > + */
> > +static bool started_after(struct deadline_data *dd, struct request *rq,
> > + unsigned long latest_start)
> > +{
> > + unsigned long start_time = (unsigned long)rq->fifo_time;
> > +
> > + start_time -= dd->fifo_expire[rq_data_dir(rq)];
> > +
> > + return time_after(start_time, latest_start);
> > +}
> > +
> > /*
> > * deadline_dispatch_requests selects the best request according to
> > - * read/write expire, fifo_batch, etc
> > + * read/write expire, fifo_batch, etc and with a start time <= @latest.
> > */
> > static struct request *__dd_dispatch_request(struct deadline_data *dd,
> > - struct dd_per_prio *per_prio)
> > + struct dd_per_prio *per_prio,
> > + unsigned long latest_start)
> > {
> > struct request *rq, *next_rq;
> > enum dd_data_dir data_dir;
> > @@ -355,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> > if (!list_empty(&per_prio->dispatch)) {
> > rq = list_first_entry(&per_prio->dispatch, struct request,
> > queuelist);
> > + if (started_after(dd, rq, latest_start))
> > + return NULL;
> > list_del_init(&rq->queuelist);
> > goto done;
> > }
> > @@ -432,6 +455,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> > dd->batching = 0;
> >
> > dispatch_request:
> > + if (started_after(dd, rq, latest_start))
> > + return NULL;
> > +
> > /*
> > * rq is the selected appropriate request.
> > */
> > @@ -449,6 +475,34 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> > return rq;
> > }
> >
> > +/*
> > + * Check whether there are any requests with priority other than DD_RT_PRIO
> > + * that were inserted more than prio_aging_expire jiffies ago.
> > + */
> > +static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
> > + unsigned long now)
> > +{
> > + struct request *rq;
> > + enum dd_prio prio;
> > + int prio_cnt;
> > +
> > + lockdep_assert_held(&dd->lock);
> > +
> > + prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
> > + !!dd_queued(dd, DD_IDLE_PRIO);
> > + if (prio_cnt < 2)
> > + return NULL;
> > +
> > + for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
> > + rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
> > + now - dd->prio_aging_expire);
> > + if (rq)
> > + return rq;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > /*
> > * Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
> > *
> > @@ -460,15 +514,26 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
> > static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
> > {
> > struct deadline_data *dd = hctx->queue->elevator->elevator_data;
> > + const unsigned long now = jiffies;
> > struct request *rq;
> > enum dd_prio prio;
> >
> > spin_lock(&dd->lock);
> > + rq = dd_dispatch_prio_aged_requests(dd, now);
> > + if (rq)
> > + goto unlock;
> > +
> > + /*
> > + * Next, dispatch requests in priority order. Ignore lower priority
> > + * requests if any higher priority requests are pending.
> > + */
> > for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
> > - rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
> > - if (rq)
> > + rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
> > + if (rq || dd_queued(dd, prio))
> > break;
> > }
> > +
> > +unlock:
> > spin_unlock(&dd->lock);
> >
> > return rq;
> > @@ -573,6 +638,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
> > dd->front_merges = 1;
> > dd->last_dir = DD_WRITE;
> > dd->fifo_batch = fifo_batch;
> > + dd->prio_aging_expire = prio_aging_expire;
> > spin_lock_init(&dd->lock);
> > spin_lock_init(&dd->zone_lock);
> >
> > @@ -796,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
> > #define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
> > SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
> > SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
> > +SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
> > SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
> > SHOW_INT(deadline_front_merges_show, dd->front_merges);
> > SHOW_INT(deadline_async_depth_show, dd->front_merges);
> > @@ -825,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
> > STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
> > STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
> > STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
> > +STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
> > STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
> > STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
> > STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
> > @@ -843,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
> > DD_ATTR(front_merges),
> > DD_ATTR(async_depth),
> > DD_ATTR(fifo_batch),
> > + DD_ATTR(prio_aging_expire),
> > __ATTR_NULL
> > };
> >
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-25 12:33 ` Oleksij Rempel
2022-03-25 13:07 ` Oleksij Rempel
@ 2022-03-25 17:05 ` Bart Van Assche
2022-03-26 8:57 ` Oleksij Rempel
1 sibling, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-03-25 17:05 UTC (permalink / raw)
To: Oleksij Rempel
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
On 3/25/22 05:33, Oleksij Rempel wrote:
> I have regression on iMX6 + USB storage device with this patch.
> After plugging USB Flash driver (in my case USB3 Intenso 32GB) and
> running mount /dev/sda1 /mnt, the command will never exit.
>
> Reverting this patchs (322cff70d46) on top of v5.17 solves it for me.
>
> How can I help you to debug this issue?
That is unexpected. Is there perhaps something special about the USB
stick for which the hang occurs, e.g. the queue depth it supports is
low? Can you please share the output of the following commands after
having inserted the USB stick that triggers the hang?
(cd /sys/class && grep -aH . scsi_host/*/can_queue scsi_device/*/device/{blacklist,inquiry,model,queue*,vendor}) | sort
Thanks,
Bart.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-25 17:05 ` Bart Van Assche
@ 2022-03-26 8:57 ` Oleksij Rempel
2022-03-28 2:20 ` Bart Van Assche
0 siblings, 1 reply; 21+ messages in thread
From: Oleksij Rempel @ 2022-03-26 8:57 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
On Fri, Mar 25, 2022 at 10:05:43AM -0700, Bart Van Assche wrote:
> On 3/25/22 05:33, Oleksij Rempel wrote:
> > I have regression on iMX6 + USB storage device with this patch.
> > After plugging USB Flash driver (in my case USB3 Intenso 32GB) and
> > running mount /dev/sda1 /mnt, the command will never exit.
> >
> > Reverting this patchs (322cff70d46) on top of v5.17 solves it for me.
> >
> > How can I help you to debug this issue?
>
> That is unexpected. Is there perhaps something special about the USB
> stick for which the hang occurs, e.g. the queue depth it supports is
> low? Can you please share the output of the following commands after
> having inserted the USB stick that triggers the hang?
>
> (cd /sys/class && grep -aH . scsi_host/*/can_queue scsi_device/*/device/{blacklist,inquiry,model,queue*,vendor}) | sort
Here it is:
root@test:/sys/class cat scsi_host/*/can_queue
1
root@test:/sys/class cat scsi_device/*/device/blacklist
root@test:/sys/class cat scsi_device/*/device/inquiry
Intenso Speed Line 1.00
root@test:/sys/class cat scsi_device/*/device/model
Speed Line
root@test:/sys/class cat scsi_device/*/device/queue*
1
none
root@test:/sys/class cat scsi_device/*/device/vendor
Intenso
I do not know, if there is something special about it. If needed, i can
take it apart to see what controller is used.
Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-26 8:57 ` Oleksij Rempel
@ 2022-03-28 2:20 ` Bart Van Assche
2022-03-28 22:24 ` Bart Van Assche
2022-03-29 4:49 ` Bart Van Assche
0 siblings, 2 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-03-28 2:20 UTC (permalink / raw)
To: Oleksij Rempel
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
On 3/26/22 01:57, Oleksij Rempel wrote:
> On Fri, Mar 25, 2022 at 10:05:43AM -0700, Bart Van Assche wrote:
>> On 3/25/22 05:33, Oleksij Rempel wrote:
>>> I have regression on iMX6 + USB storage device with this patch.
>>> After plugging USB Flash driver (in my case USB3 Intenso 32GB) and
>>> running mount /dev/sda1 /mnt, the command will never exit.
>>>
>>> Reverting this patchs (322cff70d46) on top of v5.17 solves it for me.
>>>
>>> How can I help you to debug this issue?
>>
>> That is unexpected. Is there perhaps something special about the USB
>> stick for which the hang occurs, e.g. the queue depth it supports is
>> low? Can you please share the output of the following commands after
>> having inserted the USB stick that triggers the hang?
>>
>> (cd /sys/class && grep -aH . scsi_host/*/can_queue scsi_device/*/device/{blacklist,inquiry,model,queue*,vendor}) | sort
>
> Here it is:
> root@test:/sys/class cat scsi_host/*/can_queue
> 1
> root@test:/sys/class cat scsi_device/*/device/blacklist
> root@test:/sys/class cat scsi_device/*/device/inquiry
> Intenso Speed Line 1.00
> root@test:/sys/class cat scsi_device/*/device/model
> Speed Line
> root@test:/sys/class cat scsi_device/*/device/queue*
> 1
> none
> root@test:/sys/class cat scsi_device/*/device/vendor
> Intenso
>
> I do not know, if there is something special about it. If needed, i can
> take it apart to see what controller is used.
Thanks, this helps a lot. can_queue = 1 as I was suspecting. In a quick
test I noticed that I/O is about 10x slower for queue depth 1 and the
v5.17 mq-deadline scheduler. I will take a closer look at this tomorrow.
Bart.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-28 2:20 ` Bart Van Assche
@ 2022-03-28 22:24 ` Bart Van Assche
2022-03-29 4:49 ` Bart Van Assche
1 sibling, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-03-28 22:24 UTC (permalink / raw)
To: Oleksij Rempel
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
On 3/27/22 19:20, Bart Van Assche wrote:
> Thanks, this helps a lot. can_queue = 1 as I was suspecting. In a quick
> test I noticed that I/O is about 10x slower for queue depth 1 and the
> v5.17 mq-deadline scheduler. I will take a closer look at this tomorrow.
An update: I can reproduce the slowness with (a) the patch at the start of
this thread reverted and (b) both the BFQ and the Kyber I/O schedulers. So
it seems like a regression has been introduced related to handling of queue
depth 1. I will try to bisect this.
Bart.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/4] block/mq-deadline: Prioritize high-priority requests
2022-03-28 2:20 ` Bart Van Assche
2022-03-28 22:24 ` Bart Van Assche
@ 2022-03-29 4:49 ` Bart Van Assche
1 sibling, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-03-29 4:49 UTC (permalink / raw)
To: Oleksij Rempel
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Damien Le Moal, Niklas Cassel, Hannes Reinecke
On 3/27/22 19:20, Bart Van Assche wrote:
> Thanks, this helps a lot. can_queue = 1 as I was suspecting. In a quick
> test I noticed that I/O is about 10x slower for queue depth 1 and the
> v5.17 mq-deadline scheduler. I will take a closer look at this tomorrow.
(replying to my own email)
Please take a look at the two new tests in
https://github.com/osandov/blktests/pull/87. The results of that test for
kernel v5.13 show that enabling an I/O scheduler speeds up the I/O workload
except when using BFQ (times are in centiseconds):
# cat /home/bart/software/blktests/results/nodev/block/032.full
bfq: 465 vs 452: pass
kyber: 243 vs 452: pass
mq-deadline: 230 vs 452: pass
The results for kernel v5.16 shows that enabling an I/O scheduler slows down
the I/O workload up to 2.2x:
# cat /home/bart/software/blktests/results/nodev/block/032.full
bfq: 920 vs 419: fail
kyber: 732 vs 419: fail
mq-deadline: 751 vs 419: fail
In other words, this is not an mq-deadline issue.
Bart.
^ permalink raw reply [flat|nested] 21+ messages in thread