* Hung tasks with multiple partitions @ 2020-01-30 19:34 Salman Qazi 2020-01-30 20:49 ` Bart Van Assche 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-01-30 19:34 UTC (permalink / raw) To: Jens Axboe, Linux Kernel Mailing List, linux-block Cc: Jesse Barnes, Gwendal Grignou Hi, I am writing on behalf of the Chromium OS team at Google. We found the root cause for some hung tasks we were experiencing and we would like to get your opinion on potential solutions. The bugs were encountered on 4.19 kernel. However my reading of the code suggests that the relevant portions of the code have not changed since then. We have an eMMC flash drive that has been carved into partitions on an 8 CPU system. The repro case that we came up with, is to use 8 threaded fio write-mostly workload against one partition, let the system use the other partition as the read-write filesystem (i.e. just background activity) and then run the following loop: while true; do sync; sleep 1 ; done The hung task stack traces look like the following: [ 128.994891] jbd2/dm-1-8 D 0 367 2 0x00000028 last_sleep: 96340206998. last_runnable: 96340140151 [ 128.994898] Call trace: [ 128.994903] __switch_to+0x120/0x13c [ 128.994909] __schedule+0x60c/0x7dc [ 128.994914] schedule+0x74/0x94 [ 128.994919] io_schedule+0x1c/0x40 [ 128.994925] bit_wait_io+0x18/0x58 [ 128.994930] __wait_on_bit+0x78/0xdc [ 128.994935] out_of_line_wait_on_bit+0xa0/0xcc [ 128.994943] __wait_on_buffer+0x48/0x54 [ 128.994948] jbd2_journal_commit_transaction+0x1198/0x1a4c [ 128.994956] kjournald2+0x19c/0x268 [ 128.994961] kthread+0x120/0x130 [ 128.994967] ret_from_fork+0x10/0x18 I added some more information to trace points to understand what was going on. It turns out that blk_mq_sched_dispatch_requests had checked hctx->dispatch, found it empty, and then began consuming requests from the io scheduler (in blk_mq_do_dispatch_sched). Unfortunately, the deluge from the I/O scheduler (BFQ in our case) doesn't stop for 30 seconds and there is no mechanism present in blk_mq_do_dispatch_sched to terminate early or reconsider hctx->dispatch contents. In the meantime, a flush command arrives in hctx->dispatch (via insertion in blk_mq_sched_bypass_insert) and languishes there. Eventually the thread waiting on the flush triggers the hung task watchdog. The solution that comes to mind is to periodically check hctx->dispatch in blk_mq_do_dispatch_sched and exit early if it is non-empty. However, not being an expert in this subsystem, I am not sure if there would be other consequences. Any help is appreciated, Salman ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hung tasks with multiple partitions 2020-01-30 19:34 Hung tasks with multiple partitions Salman Qazi @ 2020-01-30 20:49 ` Bart Van Assche 2020-01-30 21:02 ` Salman Qazi 0 siblings, 1 reply; 22+ messages in thread From: Bart Van Assche @ 2020-01-30 20:49 UTC (permalink / raw) To: Salman Qazi, Jens Axboe, Linux Kernel Mailing List, linux-block Cc: Jesse Barnes, Gwendal Grignou [-- Attachment #1: Type: text/plain, Size: 2815 bytes --] On 1/30/20 11:34 AM, Salman Qazi wrote: > I am writing on behalf of the Chromium OS team at Google. We found > the root cause for some hung tasks we were experiencing and we would > like to get your opinion on potential solutions. The bugs were > encountered on 4.19 kernel. > However my reading of the code suggests that the relevant portions of the > code have not changed since then. > > We have an eMMC flash drive that has been carved into partitions on an > 8 CPU system. The repro case that we came up with, is to use 8 > threaded fio write-mostly workload against one partition, let the > system use the other partition as the read-write filesystem (i.e. just > background activity) and then run the following loop: > > while true; do sync; sleep 1 ; done > > The hung task stack traces look like the following: > > [ 128.994891] jbd2/dm-1-8 D 0 367 2 0x00000028 > last_sleep: 96340206998. last_runnable: 96340140151 > [ 128.994898] Call trace: > [ 128.994903] __switch_to+0x120/0x13c > [ 128.994909] __schedule+0x60c/0x7dc > [ 128.994914] schedule+0x74/0x94 > [ 128.994919] io_schedule+0x1c/0x40 > [ 128.994925] bit_wait_io+0x18/0x58 > [ 128.994930] __wait_on_bit+0x78/0xdc > [ 128.994935] out_of_line_wait_on_bit+0xa0/0xcc > [ 128.994943] __wait_on_buffer+0x48/0x54 > [ 128.994948] jbd2_journal_commit_transaction+0x1198/0x1a4c > [ 128.994956] kjournald2+0x19c/0x268 > [ 128.994961] kthread+0x120/0x130 > [ 128.994967] ret_from_fork+0x10/0x18 > > I added some more information to trace points to understand what was > going on. It turns out that blk_mq_sched_dispatch_requests had > checked hctx->dispatch, found it empty, and then began consuming > requests from the io scheduler (in blk_mq_do_dispatch_sched). > Unfortunately, the deluge from the I/O scheduler (BFQ in our case) > doesn't stop for 30 seconds and there is no mechanism present in > blk_mq_do_dispatch_sched to terminate early or reconsider > hctx->dispatch contents. In the meantime, a flush command arrives in > hctx->dispatch (via insertion in blk_mq_sched_bypass_insert) and > languishes there. Eventually the thread waiting on the flush triggers > the hung task watchdog. > > The solution that comes to mind is to periodically check > hctx->dispatch in blk_mq_do_dispatch_sched and exit early if it is > non-empty. However, not being an expert in this subsystem, I am not > sure if there would be other consequences. The call stack shown in your e-mail usually means that an I/O request got stuck. How about determining first whether this is caused by the BFQ scheduler or by the eMMC driver? I think the developers of these software components need that information anyway before they can step in. The attached script may help to identify which requests got stuck. Bart. [-- Attachment #2: list-pending-block-requests --] [-- Type: text/plain, Size: 1501 bytes --] #!/bin/bash show_state() { local a dev=$1 for a in device/state queue/scheduler; do [ -e "$dev/$a" ] && grep -aH . "$dev/$a" done } if [ -e /sys/kernel/debug/block ]; then devs=($(cd /sys/kernel/debug/block && echo ./*)) else devs=($(cd /sys/class/block && echo ./*)) fi cd /sys/class/block || exit $? for dev in "${devs[@]}"; do dev="${dev#./}" echo "$dev" pending=0 if [ -e "$dev/mq" ]; then for f in "$dev"/mq/*/{pending,*/rq_list}; do [ -e "$f" ] || continue if { read -r line1 && read -r line2; } <"$f"; then echo "$f" echo "$line1 $line2" >/dev/null head -n 9 "$f" ((pending++)) fi done fi ( busy=0 cd /sys/kernel/debug/block >&/dev/null && { grep -aH . $dev/requeue_list; true; } && for d in "$dev"/mq/hctx* "$dev"/hctx*; do [ ! -d "$d" ] && continue { [ ! -e "$d/tags" ] || grep -q '^busy=0$' "$d/tags"; } && { [ ! -e "$d/sched_tags" ] || [ "$(<"$d/sched_tags")" = "" ] || grep -q '^busy=0$' "$d/sched_tags"; } && continue ((busy++)) for f in "$d"/{active,busy,dispatch,flags,requeue_list,sched_tags,state,tags*,cpu*/rq_list,sched/*rqs}; do [ -e "$f" ] && grep -aH . "$f" done done exit $busy ) pending=$((pending+$?)) if [ "$pending" -gt 0 ]; then ( cd /sys/kernel/debug/block >&/dev/null && if [ -e "$dev/mq/state" ]; then grep -aH . "$dev/mq/state" else grep -aH . "$dev/state" fi ) show_state "$dev" fi done ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Hung tasks with multiple partitions 2020-01-30 20:49 ` Bart Van Assche @ 2020-01-30 21:02 ` Salman Qazi [not found] ` <20200203204554.119849-1-sqazi@google.com> 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-01-30 21:02 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Linux Kernel Mailing List, linux-block, Jesse Barnes, Gwendal Grignou On Thu, Jan 30, 2020 at 12:49 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 1/30/20 11:34 AM, Salman Qazi wrote: > > I am writing on behalf of the Chromium OS team at Google. We found > > the root cause for some hung tasks we were experiencing and we would > > like to get your opinion on potential solutions. The bugs were > > encountered on 4.19 kernel. > > However my reading of the code suggests that the relevant portions of the > > code have not changed since then. > > > > We have an eMMC flash drive that has been carved into partitions on an > > 8 CPU system. The repro case that we came up with, is to use 8 > > threaded fio write-mostly workload against one partition, let the > > system use the other partition as the read-write filesystem (i.e. just > > background activity) and then run the following loop: > > > > while true; do sync; sleep 1 ; done > > > > The hung task stack traces look like the following: > > > > [ 128.994891] jbd2/dm-1-8 D 0 367 2 0x00000028 > > last_sleep: 96340206998. last_runnable: 96340140151 > > [ 128.994898] Call trace: > > [ 128.994903] __switch_to+0x120/0x13c > > [ 128.994909] __schedule+0x60c/0x7dc > > [ 128.994914] schedule+0x74/0x94 > > [ 128.994919] io_schedule+0x1c/0x40 > > [ 128.994925] bit_wait_io+0x18/0x58 > > [ 128.994930] __wait_on_bit+0x78/0xdc > > [ 128.994935] out_of_line_wait_on_bit+0xa0/0xcc > > [ 128.994943] __wait_on_buffer+0x48/0x54 > > [ 128.994948] jbd2_journal_commit_transaction+0x1198/0x1a4c > > [ 128.994956] kjournald2+0x19c/0x268 > > [ 128.994961] kthread+0x120/0x130 > > [ 128.994967] ret_from_fork+0x10/0x18 > > > > I added some more information to trace points to understand what was > > going on. It turns out that blk_mq_sched_dispatch_requests had > > checked hctx->dispatch, found it empty, and then began consuming > > requests from the io scheduler (in blk_mq_do_dispatch_sched). > > Unfortunately, the deluge from the I/O scheduler (BFQ in our case) > > doesn't stop for 30 seconds and there is no mechanism present in > > blk_mq_do_dispatch_sched to terminate early or reconsider > > hctx->dispatch contents. In the meantime, a flush command arrives in > > hctx->dispatch (via insertion in blk_mq_sched_bypass_insert) and > > languishes there. Eventually the thread waiting on the flush triggers > > the hung task watchdog. > > > > The solution that comes to mind is to periodically check > > hctx->dispatch in blk_mq_do_dispatch_sched and exit early if it is > > non-empty. However, not being an expert in this subsystem, I am not > > sure if there would be other consequences. > > The call stack shown in your e-mail usually means that an I/O request > got stuck. How about determining first whether this is caused by the BFQ > scheduler or by the eMMC driver? I think the developers of these > software components need that information anyway before they can step in. As I mentioned in my previous email, I did use trace points to arrive at my conclusion. I added trace points in blk_mq_sched_dispatch_requests to detect both the start and the end of that function, as well as where the dispatched commands were picked from. I also traced blk_mq_sched_bypass_insert and saw a flush enter hctx->dispatch after blk_mq_sched_dispatch_requests had started but before it finished. After reaching my conclusion, I also tried a simple fix by introducing an exit path in blk_mq_do_dispatch_sched, if we detect that hctx->dispatch has become non-empty. This made the problem go away. > > The attached script may help to identify which requests got stuck. > > Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20200203204554.119849-1-sqazi@google.com>]
* [PATCH] block: Limit number of items taken from the I/O scheduler in one go [not found] ` <20200203204554.119849-1-sqazi@google.com> @ 2020-02-03 20:59 ` Salman Qazi 2020-02-04 3:47 ` Bart Van Assche 2020-02-04 9:20 ` Ming Lei 0 siblings, 2 replies; 22+ messages in thread From: Salman Qazi @ 2020-02-03 20:59 UTC (permalink / raw) To: Bart Van Assche, Jens Axboe, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Salman Qazi We observed that it is possible for a flush to bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running blk_mq_do_dispatch_sched call in blk_mq_sched_dispatch_requests. However, the blk_mq_do_dispatch_sched call doesn't end in bounded time. As a result, the flush can sit there indefinitely, as the I/O scheduler feeds an arbitrary number of requests to the hardware. The solution is to periodically poll hctx->dispatch in blk_mq_do_dispatch_sched, to put a bound on the latency of the commands sitting there. Signed-off-by: Salman Qazi <sqazi@google.com> --- block/blk-mq-sched.c | 6 ++++++ block/blk-mq.c | 4 ++++ block/blk-sysfs.c | 33 +++++++++++++++++++++++++++++++++ include/linux/blkdev.h | 2 ++ 4 files changed, 45 insertions(+) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ca22afd47b3d..75cdec64b9c7 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -90,6 +90,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + int count = 0; do { struct request *rq; @@ -97,6 +98,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; + if (count > 0 && count % q->max_sched_batch == 0 && + !list_empty_careful(&hctx->dispatch)) + break; + if (!blk_mq_get_dispatch_budget(hctx)) break; @@ -112,6 +117,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); + count++; } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); } diff --git a/block/blk-mq.c b/block/blk-mq.c index a12b1763508d..7cb13aa72a94 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -40,6 +40,8 @@ #include "blk-mq-sched.h" #include "blk-rq-qos.h" +#define BLK_MQ_DEFAULT_MAX_SCHED_BATCH 100 + static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); @@ -2934,6 +2936,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, */ q->poll_nsec = BLK_MQ_POLL_CLASSIC; + q->max_sched_batch = BLK_MQ_DEFAULT_MAX_SCHED_BATCH; + blk_mq_init_cpu_queues(q, set->nr_hw_queues); blk_mq_add_queue_tag_set(set, q); blk_mq_map_swqueue(q); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..dd7b58a1bd35 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -390,6 +390,32 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page, return count; } +static ssize_t queue_max_sched_batch_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%d\n", q->max_sched_batch); +} + +static ssize_t queue_max_sched_batch_store(struct request_queue *q, + const char *page, + size_t count) +{ + int err, val; + + if (!q->mq_ops) + return -EINVAL; + + err = kstrtoint(page, 10, &val); + if (err < 0) + return err; + + if (val <= 0) + return -EINVAL; + + q->max_sched_batch = val; + + return count; +} + static ssize_t queue_poll_show(struct request_queue *q, char *page) { return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page); @@ -691,6 +717,12 @@ static struct queue_sysfs_entry queue_poll_delay_entry = { .store = queue_poll_delay_store, }; +static struct queue_sysfs_entry queue_max_sched_batch_entry = { + .attr = {.name = "max_sched_batch", .mode = 0644 }, + .show = queue_max_sched_batch_show, + .store = queue_max_sched_batch_store, +}; + static struct queue_sysfs_entry queue_wc_entry = { .attr = {.name = "write_cache", .mode = 0644 }, .show = queue_wc_show, @@ -763,6 +795,7 @@ static struct attribute *queue_attrs[] = { &queue_wb_lat_entry.attr, &queue_poll_delay_entry.attr, &queue_io_timeout_entry.attr, + &queue_max_sched_batch_entry.attr, #ifdef CONFIG_BLK_DEV_THROTTLING_LOW &throtl_sample_time_entry.attr, #endif diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 053ea4b51988..68e7d29d4dd4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -477,6 +477,8 @@ struct request_queue { unsigned int rq_timeout; int poll_nsec; + int max_sched_batch; + struct blk_stat_callback *poll_cb; struct blk_rq_stat poll_stat[BLK_MQ_POLL_STATS_BKTS]; -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-03 20:59 ` [PATCH] block: Limit number of items taken from the I/O scheduler in one go Salman Qazi @ 2020-02-04 3:47 ` Bart Van Assche 2020-02-04 9:20 ` Ming Lei 1 sibling, 0 replies; 22+ messages in thread From: Bart Van Assche @ 2020-02-04 3:47 UTC (permalink / raw) To: Salman Qazi, Jens Axboe, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Christoph Hellwig, Ming Lei, Hannes Reinecke On 2020-02-03 12:59, Salman Qazi wrote: > We observed that it is possible for a flush to bypass the I/O > scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. > This can happen while a kworker is running blk_mq_do_dispatch_sched call > in blk_mq_sched_dispatch_requests. > > However, the blk_mq_do_dispatch_sched call doesn't end in bounded time. > As a result, the flush can sit there indefinitely, as the I/O scheduler > feeds an arbitrary number of requests to the hardware. > > The solution is to periodically poll hctx->dispatch in > blk_mq_do_dispatch_sched, to put a bound on the latency of the commands > sitting there. (added Christoph, Ming and Hannes to the Cc-list) Thank you for having posted a patch; that really helps. I see that my name occurs first in the "To:" list. Since Jens is the block layer maintainer I think Jens should have been mentioned first. In version v4.20 of the Linux kernel I found the following in the legacy block layer code: * From blk_insert_flush(): list_add_tail(&rq->queuelist, &q->queue_head); * From elv_next_request(): list_for_each_entry(rq, &q->queue_head, queuelist) I think this means that the legacy block layer sent flush requests to the scheduler instead of directly to the block driver. How about modifying the blk-mq code such that it mimics that approach? I'm asking this because this patch, although the code looks clean, doesn't seem the best solution to me. > + if (count > 0 && count % q->max_sched_batch == 0 && > + !list_empty_careful(&hctx->dispatch)) > + break; A modulo operation in the hot path? Please don't do that. > +static ssize_t queue_max_sched_batch_store(struct request_queue *q, > + const char *page, > + size_t count) > +{ > + int err, val; > + > + if (!q->mq_ops) > + return -EINVAL; > + > + err = kstrtoint(page, 10, &val); > + if (err < 0) > + return err; > + > + if (val <= 0) > + return -EINVAL; > + > + q->max_sched_batch = val; > + > + return count; > +} Has it been considered to use kstrtouint() instead of checking whether the value returned by kstrtoint() is positive? > + int max_sched_batch; unsigned int? Thanks, Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-03 20:59 ` [PATCH] block: Limit number of items taken from the I/O scheduler in one go Salman Qazi 2020-02-04 3:47 ` Bart Van Assche @ 2020-02-04 9:20 ` Ming Lei 2020-02-04 18:26 ` Salman Qazi 1 sibling, 1 reply; 22+ messages in thread From: Ming Lei @ 2020-02-04 9:20 UTC (permalink / raw) To: Salman Qazi Cc: Bart Van Assche, Jens Axboe, linux-block, linux-kernel, Jesse Barnes, Gwendal Grignou On Mon, Feb 03, 2020 at 12:59:50PM -0800, Salman Qazi wrote: > We observed that it is possible for a flush to bypass the I/O > scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. We always bypass io scheduler for flush request. > This can happen while a kworker is running blk_mq_do_dispatch_sched call > in blk_mq_sched_dispatch_requests. > > However, the blk_mq_do_dispatch_sched call doesn't end in bounded time. > As a result, the flush can sit there indefinitely, as the I/O scheduler > feeds an arbitrary number of requests to the hardware. blk-mq supposes to handle requests in hctx->dispatch with higher priority, see blk_mq_sched_dispatch_requests(). However, there is single hctx->run_work, so async run queue for dispatching flush request can only be run until another async run queue from scheduler is done. > > The solution is to periodically poll hctx->dispatch in > blk_mq_do_dispatch_sched, to put a bound on the latency of the commands > sitting there. > > Signed-off-by: Salman Qazi <sqazi@google.com> > --- > block/blk-mq-sched.c | 6 ++++++ > block/blk-mq.c | 4 ++++ > block/blk-sysfs.c | 33 +++++++++++++++++++++++++++++++++ > include/linux/blkdev.h | 2 ++ > 4 files changed, 45 insertions(+) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index ca22afd47b3d..75cdec64b9c7 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -90,6 +90,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > struct request_queue *q = hctx->queue; > struct elevator_queue *e = q->elevator; > LIST_HEAD(rq_list); > + int count = 0; > > do { > struct request *rq; > @@ -97,6 +98,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > break; > > + if (count > 0 && count % q->max_sched_batch == 0 && > + !list_empty_careful(&hctx->dispatch)) > + break; q->max_sched_batch may not be needed, and it is reasonable to always disaptch requests in hctx->dispatch first. Also such trick is missed in dispatch from sw queue. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-04 9:20 ` Ming Lei @ 2020-02-04 18:26 ` Salman Qazi 2020-02-04 19:37 ` Salman Qazi 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-02-04 18:26 UTC (permalink / raw) To: Ming Lei Cc: Bart Van Assche, Jens Axboe, linux-block, Linux Kernel Mailing List, Jesse Barnes, Gwendal Grignou On Tue, Feb 4, 2020 at 1:20 AM Ming Lei <ming.lei@redhat.com> wrote: > > On Mon, Feb 03, 2020 at 12:59:50PM -0800, Salman Qazi wrote: > > We observed that it is possible for a flush to bypass the I/O > > scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. > > We always bypass io scheduler for flush request. > > > This can happen while a kworker is running blk_mq_do_dispatch_sched call > > in blk_mq_sched_dispatch_requests. > > > > However, the blk_mq_do_dispatch_sched call doesn't end in bounded time. > > As a result, the flush can sit there indefinitely, as the I/O scheduler > > feeds an arbitrary number of requests to the hardware. > > blk-mq supposes to handle requests in hctx->dispatch with higher > priority, see blk_mq_sched_dispatch_requests(). > > However, there is single hctx->run_work, so async run queue for dispatching > flush request can only be run until another async run queue from scheduler > is done. > > > > > The solution is to periodically poll hctx->dispatch in > > blk_mq_do_dispatch_sched, to put a bound on the latency of the commands > > sitting there. > > > > Signed-off-by: Salman Qazi <sqazi@google.com> > > --- > > block/blk-mq-sched.c | 6 ++++++ > > block/blk-mq.c | 4 ++++ > > block/blk-sysfs.c | 33 +++++++++++++++++++++++++++++++++ > > include/linux/blkdev.h | 2 ++ > > 4 files changed, 45 insertions(+) > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index ca22afd47b3d..75cdec64b9c7 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -90,6 +90,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > struct request_queue *q = hctx->queue; > > struct elevator_queue *e = q->elevator; > > LIST_HEAD(rq_list); > > + int count = 0; > > > > do { > > struct request *rq; > > @@ -97,6 +98,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > > break; > > > > + if (count > 0 && count % q->max_sched_batch == 0 && > > + !list_empty_careful(&hctx->dispatch)) > > + break; > > q->max_sched_batch may not be needed, and it is reasonable to always > disaptch requests in hctx->dispatch first. > > Also such trick is missed in dispatch from sw queue. I will update the commit message, drop max_sched_batch and just turn it into a simple check and add the same thing to the dispatch from the sw queue. > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-04 18:26 ` Salman Qazi @ 2020-02-04 19:37 ` Salman Qazi 2020-02-05 4:55 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-02-04 19:37 UTC (permalink / raw) To: Jens Axboe, Ming Lei, Bart Van Assche, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig, Salman Qazi Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked. The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands. Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun. A similar phenomenon exists with dispatches from the software queue. The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun. Signed-off-by: Salman Qazi <sqazi@google.com> --- block/blk-mq-sched.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ca22afd47b3d..d1b8b31bc3d4 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -97,6 +97,9 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; + if (!list_empty_careful(&hctx->dispatch)) + break; + if (!blk_mq_get_dispatch_budget(hctx)) break; @@ -140,6 +143,9 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) do { struct request *rq; + if (!list_empty_careful(&hctx->dispatch)) + break; + if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-04 19:37 ` Salman Qazi @ 2020-02-05 4:55 ` Ming Lei 2020-02-05 19:57 ` Salman Qazi 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2020-02-05 4:55 UTC (permalink / raw) To: Salman Qazi Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On Tue, Feb 04, 2020 at 11:37:11AM -0800, Salman Qazi wrote: > Flushes bypass the I/O scheduler and get added to hctx->dispatch > in blk_mq_sched_bypass_insert. This can happen while a kworker is running > hctx->run_work work item and is past the point in > blk_mq_sched_dispatch_requests where hctx->dispatch is checked. > > The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, > because the I/O scheduler can feed an arbitrary number of commands. > > Since we have only one hctx->run_work, the commands waiting in > hctx->dispatch will wait an arbitrary length of time for run_work to be > rerun. > > A similar phenomenon exists with dispatches from the software queue. > > The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and > blk_mq_do_dispatch_ctx and return from the run_work handler and let it > rerun. > > Signed-off-by: Salman Qazi <sqazi@google.com> > --- > block/blk-mq-sched.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index ca22afd47b3d..d1b8b31bc3d4 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -97,6 +97,9 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > break; > > + if (!list_empty_careful(&hctx->dispatch)) > + break; > + > if (!blk_mq_get_dispatch_budget(hctx)) > break; > > @@ -140,6 +143,9 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > do { > struct request *rq; > > + if (!list_empty_careful(&hctx->dispatch)) > + break; > + > if (!sbitmap_any_bit_set(&hctx->ctx_map)) > break; This approach looks good, given actually we retrieved request this way in legacy IO request path, see __elv_next_request(). However, blk_mq_request_bypass_insert() may be run at the same time, so this patch may cause requests stalled in scheduler queue. How about returning if there is request available in hctx->dispatch from the two helpers, then re-dispatch requests in blk_mq_sched_dispatch_requests() if yes. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-05 4:55 ` Ming Lei @ 2020-02-05 19:57 ` Salman Qazi 2020-02-06 10:18 ` Ming Lei 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-02-05 19:57 UTC (permalink / raw) To: Jens Axboe, Ming Lei, Bart Van Assche, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig, Salman Qazi Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked. The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands. Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun. A similar phenomenon exists with dispatches from the software queue. The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun. Signed-off-by: Salman Qazi <sqazi@google.com> --- block/blk-mq-sched.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ca22afd47b3d..52249fddeb66 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -84,12 +84,16 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. */ -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + bool ret = false; do { struct request *rq; @@ -97,6 +101,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!blk_mq_get_dispatch_budget(hctx)) break; @@ -113,6 +122,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) */ list_add(&rq->queuelist, &rq_list); } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + + return ret; } static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, @@ -130,16 +141,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. */ -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); + bool ret = false; do { struct request *rq; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; @@ -165,6 +185,7 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); WRITE_ONCE(hctx->dispatch_from, ctx); + return ret; } void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) @@ -172,6 +193,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.dispatch_request; + bool run_again = false; LIST_HEAD(rq_list); /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -208,19 +230,22 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) blk_mq_sched_mark_restart_hctx(hctx); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { if (has_sched_dispatch) - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); else - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } } else if (has_sched_dispatch) { - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); } else if (hctx->dispatch_busy) { /* dequeue request one by one from sw queue if queue is busy */ - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list, false); } + + if (run_again) + blk_mq_run_hw_queue(hctx, true); } bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-05 19:57 ` Salman Qazi @ 2020-02-06 10:18 ` Ming Lei 2020-02-06 21:12 ` Salman Qazi 0 siblings, 1 reply; 22+ messages in thread From: Ming Lei @ 2020-02-06 10:18 UTC (permalink / raw) To: Salman Qazi Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On Wed, Feb 05, 2020 at 11:57:06AM -0800, Salman Qazi wrote: > Flushes bypass the I/O scheduler and get added to hctx->dispatch > in blk_mq_sched_bypass_insert. This can happen while a kworker is running > hctx->run_work work item and is past the point in > blk_mq_sched_dispatch_requests where hctx->dispatch is checked. > > The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, > because the I/O scheduler can feed an arbitrary number of commands. > > Since we have only one hctx->run_work, the commands waiting in > hctx->dispatch will wait an arbitrary length of time for run_work to be > rerun. > > A similar phenomenon exists with dispatches from the software queue. > > The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and > blk_mq_do_dispatch_ctx and return from the run_work handler and let it > rerun. > > Signed-off-by: Salman Qazi <sqazi@google.com> > --- > block/blk-mq-sched.c | 37 +++++++++++++++++++++++++++++++------ > 1 file changed, 31 insertions(+), 6 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index ca22afd47b3d..52249fddeb66 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -84,12 +84,16 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > * Only SCSI implements .get_budget and .put_budget, and SCSI restarts > * its queue by itself in its completion handler, so we don't need to > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > + * > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. > */ > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > struct request_queue *q = hctx->queue; > struct elevator_queue *e = q->elevator; > LIST_HEAD(rq_list); > + bool ret = false; > > do { > struct request *rq; > @@ -97,6 +101,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > break; > > + if (!list_empty_careful(&hctx->dispatch)) { > + ret = true; > + break; > + } > + > if (!blk_mq_get_dispatch_budget(hctx)) > break; > > @@ -113,6 +122,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > */ > list_add(&rq->queuelist, &rq_list); > } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > + > + return ret; > } > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > @@ -130,16 +141,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > * Only SCSI implements .get_budget and .put_budget, and SCSI restarts > * its queue by itself in its completion handler, so we don't need to > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > + * > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. > */ > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > { > struct request_queue *q = hctx->queue; > LIST_HEAD(rq_list); > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); > + bool ret = false; > > do { > struct request *rq; > > + if (!list_empty_careful(&hctx->dispatch)) { > + ret = true; > + break; > + } > + > if (!sbitmap_any_bit_set(&hctx->ctx_map)) > break; > > @@ -165,6 +185,7 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > > WRITE_ONCE(hctx->dispatch_from, ctx); > + return ret; > } > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > @@ -172,6 +193,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > struct request_queue *q = hctx->queue; > struct elevator_queue *e = q->elevator; > const bool has_sched_dispatch = e && e->type->ops.dispatch_request; > + bool run_again = false; > LIST_HEAD(rq_list); > > /* RCU or SRCU read lock is needed before checking quiesced flag */ > @@ -208,19 +230,22 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > blk_mq_sched_mark_restart_hctx(hctx); > if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > if (has_sched_dispatch) > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > else > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } > } else if (has_sched_dispatch) { > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > } else if (hctx->dispatch_busy) { > /* dequeue request one by one from sw queue if queue is busy */ > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } else { > blk_mq_flush_busy_ctxs(hctx, &rq_list); > blk_mq_dispatch_rq_list(q, &rq_list, false); > } > + > + if (run_again) > + blk_mq_run_hw_queue(hctx, true); One improvement may be to run again locally in this function by limited times(such as 1) first, then switch to blk_mq_run_hw_queue() if run again is still needed. This way may save one async run hw queue. Thanks, Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-06 10:18 ` Ming Lei @ 2020-02-06 21:12 ` Salman Qazi 2020-02-07 2:07 ` Ming Lei 2020-02-07 15:26 ` Bart Van Assche 0 siblings, 2 replies; 22+ messages in thread From: Salman Qazi @ 2020-02-06 21:12 UTC (permalink / raw) To: Jens Axboe, Ming Lei, Bart Van Assche, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig, Salman Qazi Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked. The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands. Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun. A similar phenomenon exists with dispatches from the software queue. The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun. Signed-off-by: Salman Qazi <sqazi@google.com> --- block/blk-mq-sched.c | 47 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ca22afd47b3d..84dde147f646 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -84,12 +84,16 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. */ -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + bool ret = false; do { struct request *rq; @@ -97,6 +101,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!blk_mq_get_dispatch_budget(hctx)) break; @@ -113,6 +122,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) */ list_add(&rq->queuelist, &rq_list); } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + + return ret; } static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, @@ -130,16 +141,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. */ -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); + bool ret = false; do { struct request *rq; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; @@ -165,6 +185,7 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); WRITE_ONCE(hctx->dispatch_from, ctx); + return ret; } void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) @@ -172,6 +193,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.dispatch_request; + bool run_again; + bool restarted = false; LIST_HEAD(rq_list); /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -180,6 +203,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) hctx->run++; +again: + run_again = false; + /* * If we have previous entries on our dispatch list, grab them first for * more fair dispatch. @@ -208,19 +234,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) blk_mq_sched_mark_restart_hctx(hctx); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { if (has_sched_dispatch) - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); else - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } } else if (has_sched_dispatch) { - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); } else if (hctx->dispatch_busy) { /* dequeue request one by one from sw queue if queue is busy */ - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list, false); } + + if (run_again) { + if (!restarted) { + restarted = true; + goto again; + } + + blk_mq_run_hw_queue(hctx, true); + } } bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-06 21:12 ` Salman Qazi @ 2020-02-07 2:07 ` Ming Lei 2020-02-07 15:26 ` Bart Van Assche 1 sibling, 0 replies; 22+ messages in thread From: Ming Lei @ 2020-02-07 2:07 UTC (permalink / raw) To: Salman Qazi Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On Thu, Feb 06, 2020 at 01:12:22PM -0800, Salman Qazi wrote: > Flushes bypass the I/O scheduler and get added to hctx->dispatch > in blk_mq_sched_bypass_insert. This can happen while a kworker is running > hctx->run_work work item and is past the point in > blk_mq_sched_dispatch_requests where hctx->dispatch is checked. > > The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, > because the I/O scheduler can feed an arbitrary number of commands. > > Since we have only one hctx->run_work, the commands waiting in > hctx->dispatch will wait an arbitrary length of time for run_work to be > rerun. > > A similar phenomenon exists with dispatches from the software queue. > > The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and > blk_mq_do_dispatch_ctx and return from the run_work handler and let it > rerun. > > Signed-off-by: Salman Qazi <sqazi@google.com> > --- > block/blk-mq-sched.c | 47 ++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 41 insertions(+), 6 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index ca22afd47b3d..84dde147f646 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -84,12 +84,16 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > * Only SCSI implements .get_budget and .put_budget, and SCSI restarts > * its queue by itself in its completion handler, so we don't need to > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > + * > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. > */ > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > struct request_queue *q = hctx->queue; > struct elevator_queue *e = q->elevator; > LIST_HEAD(rq_list); > + bool ret = false; > > do { > struct request *rq; > @@ -97,6 +101,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > break; > > + if (!list_empty_careful(&hctx->dispatch)) { > + ret = true; > + break; > + } > + > if (!blk_mq_get_dispatch_budget(hctx)) > break; > > @@ -113,6 +122,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > */ > list_add(&rq->queuelist, &rq_list); > } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > + > + return ret; > } > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > @@ -130,16 +141,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > * Only SCSI implements .get_budget and .put_budget, and SCSI restarts > * its queue by itself in its completion handler, so we don't need to > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > + * > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. > */ > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > { > struct request_queue *q = hctx->queue; > LIST_HEAD(rq_list); > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); > + bool ret = false; > > do { > struct request *rq; > > + if (!list_empty_careful(&hctx->dispatch)) { > + ret = true; > + break; > + } > + > if (!sbitmap_any_bit_set(&hctx->ctx_map)) > break; > > @@ -165,6 +185,7 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > > WRITE_ONCE(hctx->dispatch_from, ctx); > + return ret; > } > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > @@ -172,6 +193,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > struct request_queue *q = hctx->queue; > struct elevator_queue *e = q->elevator; > const bool has_sched_dispatch = e && e->type->ops.dispatch_request; > + bool run_again; > + bool restarted = false; > LIST_HEAD(rq_list); > > /* RCU or SRCU read lock is needed before checking quiesced flag */ > @@ -180,6 +203,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > hctx->run++; > > +again: > + run_again = false; > + > /* > * If we have previous entries on our dispatch list, grab them first for > * more fair dispatch. > @@ -208,19 +234,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > blk_mq_sched_mark_restart_hctx(hctx); > if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > if (has_sched_dispatch) > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > else > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } > } else if (has_sched_dispatch) { > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > } else if (hctx->dispatch_busy) { > /* dequeue request one by one from sw queue if queue is busy */ > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } else { > blk_mq_flush_busy_ctxs(hctx, &rq_list); > blk_mq_dispatch_rq_list(q, &rq_list, false); > } > + > + if (run_again) { > + if (!restarted) { > + restarted = true; > + goto again; > + } > + > + blk_mq_run_hw_queue(hctx, true); > + } > } > > bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, > -- > 2.25.0.341.g760bfbb309-goog > Reviewed-by: Ming Lei <ming.lei@redhat.com> -- Ming ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-06 21:12 ` Salman Qazi 2020-02-07 2:07 ` Ming Lei @ 2020-02-07 15:26 ` Bart Van Assche 2020-02-07 18:45 ` Salman Qazi 1 sibling, 1 reply; 22+ messages in thread From: Bart Van Assche @ 2020-02-07 15:26 UTC (permalink / raw) To: Salman Qazi, Jens Axboe, Ming Lei, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On 2020-02-06 13:12, Salman Qazi wrote: > + * > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. Please elaborate this comment and explain why this is necessary (to avoid that flush processing is postponed forever). > + * Returns true if hctx->dispatch was found non-empty and > + * run_work has to be run again. Same comment here. > +again: > + run_again = false; > + > /* > * If we have previous entries on our dispatch list, grab them first for > * more fair dispatch. > @@ -208,19 +234,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > blk_mq_sched_mark_restart_hctx(hctx); > if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > if (has_sched_dispatch) > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > else > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } > } else if (has_sched_dispatch) { > - blk_mq_do_dispatch_sched(hctx); > + run_again = blk_mq_do_dispatch_sched(hctx); > } else if (hctx->dispatch_busy) { > /* dequeue request one by one from sw queue if queue is busy */ > - blk_mq_do_dispatch_ctx(hctx); > + run_again = blk_mq_do_dispatch_ctx(hctx); > } else { > blk_mq_flush_busy_ctxs(hctx, &rq_list); > blk_mq_dispatch_rq_list(q, &rq_list, false); > } > + > + if (run_again) { > + if (!restarted) { > + restarted = true; > + goto again; > + } > + > + blk_mq_run_hw_queue(hctx, true); > + } So this patch changes blk_mq_sched_dispatch_requests() such that it iterates at most two times? How about implementing that loop with an explicit for-loop? I think that will make blk_mq_sched_dispatch_requests() easier to read. As you may know forward goto's are accepted in kernel code but backward goto's are frowned upon. Thanks, Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-07 15:26 ` Bart Van Assche @ 2020-02-07 18:45 ` Salman Qazi 2020-02-07 19:04 ` Salman Qazi 2020-02-07 20:19 ` Bart Van Assche 0 siblings, 2 replies; 22+ messages in thread From: Salman Qazi @ 2020-02-07 18:45 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Ming Lei, linux-block, Linux Kernel Mailing List, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On Fri, Feb 7, 2020 at 7:26 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 2020-02-06 13:12, Salman Qazi wrote: > > + * > > + * Returns true if hctx->dispatch was found non-empty and > > + * run_work has to be run again. > > Please elaborate this comment and explain why this is necessary (to > avoid that flush processing is postponed forever). > > > + * Returns true if hctx->dispatch was found non-empty and > > + * run_work has to be run again. > > Same comment here. Will do. > > > +again: > > + run_again = false; > > + > > /* > > * If we have previous entries on our dispatch list, grab them first for > > * more fair dispatch. > > @@ -208,19 +234,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > blk_mq_sched_mark_restart_hctx(hctx); > > if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > > if (has_sched_dispatch) > > - blk_mq_do_dispatch_sched(hctx); > > + run_again = blk_mq_do_dispatch_sched(hctx); > > else > > - blk_mq_do_dispatch_ctx(hctx); > > + run_again = blk_mq_do_dispatch_ctx(hctx); > > } > > } else if (has_sched_dispatch) { > > - blk_mq_do_dispatch_sched(hctx); > > + run_again = blk_mq_do_dispatch_sched(hctx); > > } else if (hctx->dispatch_busy) { > > /* dequeue request one by one from sw queue if queue is busy */ > > - blk_mq_do_dispatch_ctx(hctx); > > + run_again = blk_mq_do_dispatch_ctx(hctx); > > } else { > > blk_mq_flush_busy_ctxs(hctx, &rq_list); > > blk_mq_dispatch_rq_list(q, &rq_list, false); > > } > > + > > + if (run_again) { > > + if (!restarted) { > > + restarted = true; > > + goto again; > > + } > > + > > + blk_mq_run_hw_queue(hctx, true); > > + } > > So this patch changes blk_mq_sched_dispatch_requests() such that it > iterates at most two times? How about implementing that loop with an > explicit for-loop? I think that will make > blk_mq_sched_dispatch_requests() easier to read. As you may know forward > goto's are accepted in kernel code but backward goto's are frowned upon. > About the goto, I don't know if backwards gotos in general are frowned upon. There are plenty of examples in the kernel source. This particular label, 'again' for instance: $ grep -r again: mm/|wc -l 22 $ grep -r again: block/|wc -l 4 But, just because others have done it doesn't mean I should. So, I will attempt to explain why I think this is a good idea. If I were to write this as a for-loop, it will look like this: for (i = 0; i == 0 || (run_again && i < 2); i++) { /* another level of 8 character wide indentation */ run_again = false; /* a bunch of code that possibly sets run_again to true } if (run_again) blk_mq_run_hw_queue(hctx, true); [Another alternative is to set run_again to true, and simplify the for-loop condition to run_again && i < 2. But, again, lots of verbiage and a boolean in the for-loop condition.] The for-loop is far from idiomatic. It's not clear what it does when you first look at it. It distracts from the common path of the code, which is something that almost always runs exactly once. There is now an additional level of indentation. The readers of the code aren't any better off, because they still have to figure out what run_again is and if they care about it. And the only way to do that is to read the entire body of the loop, and comments at the top of the functions. The goto in this case preserves the intent of the code better. It is dealing with an exceptional and unusual case. Indeed this kind of use is not unusual in the kernel, for instance to deal with possible but unlikely races. Just my $0.02. > Thanks, > > Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-07 18:45 ` Salman Qazi @ 2020-02-07 19:04 ` Salman Qazi 2020-02-07 20:19 ` Bart Van Assche 1 sibling, 0 replies; 22+ messages in thread From: Salman Qazi @ 2020-02-07 19:04 UTC (permalink / raw) To: Jens Axboe, Ming Lei, Bart Van Assche, linux-block, linux-kernel Cc: Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig, Salman Qazi Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked. The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands. Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun. A similar phenomenon exists with dispatches from the software queue. The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun. Signed-off-by: Salman Qazi <sqazi@google.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq-sched.c | 49 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index ca22afd47b3d..3e78c5bbb4d9 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -84,12 +84,17 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. This is necessary to avoid + * starving flushes. */ -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; LIST_HEAD(rq_list); + bool ret = false; do { struct request *rq; @@ -97,6 +102,11 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!blk_mq_get_dispatch_budget(hctx)) break; @@ -113,6 +123,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) */ list_add(&rq->queuelist, &rq_list); } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); + + return ret; } static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, @@ -130,16 +142,26 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. + * + * Returns true if hctx->dispatch was found non-empty and + * run_work has to be run again. This is necessary to avoid + * starving flushes. */ -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); + bool ret = false; do { struct request *rq; + if (!list_empty_careful(&hctx->dispatch)) { + ret = true; + break; + } + if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; @@ -165,6 +187,7 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); WRITE_ONCE(hctx->dispatch_from, ctx); + return ret; } void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) @@ -172,6 +195,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.dispatch_request; + bool run_again; + bool restarted = false; LIST_HEAD(rq_list); /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -180,6 +205,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) hctx->run++; +again: + run_again = false; + /* * If we have previous entries on our dispatch list, grab them first for * more fair dispatch. @@ -208,19 +236,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) blk_mq_sched_mark_restart_hctx(hctx); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { if (has_sched_dispatch) - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); else - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } } else if (has_sched_dispatch) { - blk_mq_do_dispatch_sched(hctx); + run_again = blk_mq_do_dispatch_sched(hctx); } else if (hctx->dispatch_busy) { /* dequeue request one by one from sw queue if queue is busy */ - blk_mq_do_dispatch_ctx(hctx); + run_again = blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list, false); } + + if (run_again) { + if (!restarted) { + restarted = true; + goto again; + } + + blk_mq_run_hw_queue(hctx, true); + } } bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, -- 2.25.0.341.g760bfbb309-goog ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-07 18:45 ` Salman Qazi 2020-02-07 19:04 ` Salman Qazi @ 2020-02-07 20:19 ` Bart Van Assche 2020-02-07 20:37 ` Salman Qazi 1 sibling, 1 reply; 22+ messages in thread From: Bart Van Assche @ 2020-02-07 20:19 UTC (permalink / raw) To: Salman Qazi Cc: Jens Axboe, Ming Lei, linux-block, Linux Kernel Mailing List, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On 2/7/20 10:45 AM, Salman Qazi wrote: > If I were to write this as a for-loop, it will look like this: > > for (i = 0; i == 0 || (run_again && i < 2); i++) { > /* another level of 8 character wide indentation */ > run_again = false; > /* a bunch of code that possibly sets run_again to true > } > > if (run_again) > blk_mq_run_hw_queue(hctx, true); That's not what I meant. What I meant is a loop that iterates at most two times and also to break out of the loop if run_again == false. BTW, I share your concern about the additional indentation by eight positions. How about avoiding deeper indentation by introducing a new function? Thanks, Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-07 20:19 ` Bart Van Assche @ 2020-02-07 20:37 ` Salman Qazi 2020-04-20 16:42 ` Doug Anderson 0 siblings, 1 reply; 22+ messages in thread From: Salman Qazi @ 2020-02-07 20:37 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Ming Lei, linux-block, Linux Kernel Mailing List, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On Fri, Feb 7, 2020 at 12:19 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 2/7/20 10:45 AM, Salman Qazi wrote: > > If I were to write this as a for-loop, it will look like this: > > > > for (i = 0; i == 0 || (run_again && i < 2); i++) { > > /* another level of 8 character wide indentation */ > > run_again = false; > > /* a bunch of code that possibly sets run_again to true > > } > > > > if (run_again) > > blk_mq_run_hw_queue(hctx, true); > > That's not what I meant. What I meant is a loop that iterates at most > two times and also to break out of the loop if run_again == false. > I picked the most compact variant to demonstrate the problem. Adding breaks isn't really helping the readability. for (i = 0; i < 2; i++) { run_again = false; /* bunch of code that possibly sets it to true */ ... if (!run_again) break; } if (run_again) blk_mq_run_hw_queue(hctx, true); When I read this, I initially assume that the loop in general runs twice and that this is the common case. It has the same problem with conveying intent. Perhaps, more importantly, the point of using programming constructs is to shorten and simplify the code. There are still two if-statements in addition to the loop. We haven't gained much by introducing the loop. > BTW, I share your concern about the additional indentation by eight > positions. How about avoiding deeper indentation by introducing a new > function? If there was a benefit to introducing the loop, this would be a good call. But the way I see it, the introduction of another function is yet another way in which the introduction of the loop makes the code less readable. This is not a hill I want to die on. If the maintainer agrees with you on this point, I will use a loop. > > Thanks, > > Bart. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-02-07 20:37 ` Salman Qazi @ 2020-04-20 16:42 ` Doug Anderson 2020-04-23 20:13 ` Jesse Barnes 0 siblings, 1 reply; 22+ messages in thread From: Doug Anderson @ 2020-04-20 16:42 UTC (permalink / raw) To: Salman Qazi, Jens Axboe Cc: Bart Van Assche, Ming Lei, linux-block, Linux Kernel Mailing List, Jesse Barnes, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig Hi, On Fri, Feb 7, 2020 at 12:38 PM Salman Qazi <sqazi@google.com> wrote: > > On Fri, Feb 7, 2020 at 12:19 PM Bart Van Assche <bvanassche@acm.org> wrote: > > > > On 2/7/20 10:45 AM, Salman Qazi wrote: > > > If I were to write this as a for-loop, it will look like this: > > > > > > for (i = 0; i == 0 || (run_again && i < 2); i++) { > > > /* another level of 8 character wide indentation */ > > > run_again = false; > > > /* a bunch of code that possibly sets run_again to true > > > } > > > > > > if (run_again) > > > blk_mq_run_hw_queue(hctx, true); > > > > That's not what I meant. What I meant is a loop that iterates at most > > two times and also to break out of the loop if run_again == false. > > > > I picked the most compact variant to demonstrate the problem. Adding > breaks isn't > really helping the readability. > > for (i = 0; i < 2; i++) { > run_again = false; > /* bunch of code that possibly sets it to true */ > ... > if (!run_again) > break; > } > if (run_again) > blk_mq_run_hw_queue(hctx, true); > > When I read this, I initially assume that the loop in general runs > twice and that this is the common case. It has the > same problem with conveying intent. Perhaps, more importantly, the > point of using programming constructs is to shorten and simplify the > code. > There are still two if-statements in addition to the loop. We haven't > gained much by introducing the loop. > > > BTW, I share your concern about the additional indentation by eight > > positions. How about avoiding deeper indentation by introducing a new > > function? > > If there was a benefit to introducing the loop, this would be a good > call. But the way I see it, the introduction of another > function is yet another way in which the introduction of the loop > makes the code less readable. > > This is not a hill I want to die on. If the maintainer agrees with > you on this point, I will use a loop. I haven't done a massive amount of analysis of this patch, but since I noticed it while debugging my own block series I've been keeping track of it. Is there any status update here? We've been carrying this "FROMLIST" in our Chrome OS trees for a little while, but that's not a state we want to be in long-term. If it needs to spin before landing upstream we should get the spin out and land it. If it's OK as-is then it'd be nice to see it in mainline. From the above I guess Salman was waiting for Jens to weigh in with an opinion on the prefered bike shed color? -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-04-20 16:42 ` Doug Anderson @ 2020-04-23 20:13 ` Jesse Barnes 2020-04-23 20:34 ` Jens Axboe 0 siblings, 1 reply; 22+ messages in thread From: Jesse Barnes @ 2020-04-23 20:13 UTC (permalink / raw) To: Doug Anderson Cc: Salman Qazi, Jens Axboe, Bart Van Assche, Ming Lei, linux-block, Linux Kernel Mailing List, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig Jens, Bart, Ming, any update here? Or is this already applied (I didn't check)? Thanks, Jesse On Mon, Apr 20, 2020 at 9:43 AM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Fri, Feb 7, 2020 at 12:38 PM Salman Qazi <sqazi@google.com> wrote: > > > > On Fri, Feb 7, 2020 at 12:19 PM Bart Van Assche <bvanassche@acm.org> wrote: > > > > > > On 2/7/20 10:45 AM, Salman Qazi wrote: > > > > If I were to write this as a for-loop, it will look like this: > > > > > > > > for (i = 0; i == 0 || (run_again && i < 2); i++) { > > > > /* another level of 8 character wide indentation */ > > > > run_again = false; > > > > /* a bunch of code that possibly sets run_again to true > > > > } > > > > > > > > if (run_again) > > > > blk_mq_run_hw_queue(hctx, true); > > > > > > That's not what I meant. What I meant is a loop that iterates at most > > > two times and also to break out of the loop if run_again == false. > > > > > > > I picked the most compact variant to demonstrate the problem. Adding > > breaks isn't > > really helping the readability. > > > > for (i = 0; i < 2; i++) { > > run_again = false; > > /* bunch of code that possibly sets it to true */ > > ... > > if (!run_again) > > break; > > } > > if (run_again) > > blk_mq_run_hw_queue(hctx, true); > > > > When I read this, I initially assume that the loop in general runs > > twice and that this is the common case. It has the > > same problem with conveying intent. Perhaps, more importantly, the > > point of using programming constructs is to shorten and simplify the > > code. > > There are still two if-statements in addition to the loop. We haven't > > gained much by introducing the loop. > > > > > BTW, I share your concern about the additional indentation by eight > > > positions. How about avoiding deeper indentation by introducing a new > > > function? > > > > If there was a benefit to introducing the loop, this would be a good > > call. But the way I see it, the introduction of another > > function is yet another way in which the introduction of the loop > > makes the code less readable. > > > > This is not a hill I want to die on. If the maintainer agrees with > > you on this point, I will use a loop. > > I haven't done a massive amount of analysis of this patch, but since I > noticed it while debugging my own block series I've been keeping track > of it. Is there any status update here? We've been carrying this > "FROMLIST" in our Chrome OS trees for a little while, but that's not a > state we want to be in long-term. If it needs to spin before landing > upstream we should get the spin out and land it. If it's OK as-is > then it'd be nice to see it in mainline. > > From the above I guess Salman was waiting for Jens to weigh in with an > opinion on the prefered bike shed color? > > -Doug ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-04-23 20:13 ` Jesse Barnes @ 2020-04-23 20:34 ` Jens Axboe 2020-04-23 20:40 ` Salman Qazi 0 siblings, 1 reply; 22+ messages in thread From: Jens Axboe @ 2020-04-23 20:34 UTC (permalink / raw) To: Jesse Barnes, Doug Anderson Cc: Salman Qazi, Bart Van Assche, Ming Lei, linux-block, Linux Kernel Mailing List, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig On 4/23/20 2:13 PM, Jesse Barnes wrote: > Jens, Bart, Ming, any update here? Or is this already applied (I didn't check)? No updates on my end. I was expecting that a v2 would be posted after all the discussion on it, but that doesn't seem to be the case. -- Jens Axboe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go 2020-04-23 20:34 ` Jens Axboe @ 2020-04-23 20:40 ` Salman Qazi 0 siblings, 0 replies; 22+ messages in thread From: Salman Qazi @ 2020-04-23 20:40 UTC (permalink / raw) To: Jens Axboe Cc: Jesse Barnes, Doug Anderson, Bart Van Assche, Ming Lei, linux-block, Linux Kernel Mailing List, Gwendal Grignou, Hannes Reinecke, Christoph Hellwig I had posted a version on Feb 7th, but I guess I neglected to explicitly label it as v2. I am sorry I am not well-acquainted with the processes here. On Thu, Apr 23, 2020 at 1:34 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 4/23/20 2:13 PM, Jesse Barnes wrote: > > Jens, Bart, Ming, any update here? Or is this already applied (I didn't check)? > > No updates on my end. I was expecting that a v2 would be posted after > all the discussion on it, but that doesn't seem to be the case. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-04-23 20:40 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-30 19:34 Hung tasks with multiple partitions Salman Qazi 2020-01-30 20:49 ` Bart Van Assche 2020-01-30 21:02 ` Salman Qazi [not found] ` <20200203204554.119849-1-sqazi@google.com> 2020-02-03 20:59 ` [PATCH] block: Limit number of items taken from the I/O scheduler in one go Salman Qazi 2020-02-04 3:47 ` Bart Van Assche 2020-02-04 9:20 ` Ming Lei 2020-02-04 18:26 ` Salman Qazi 2020-02-04 19:37 ` Salman Qazi 2020-02-05 4:55 ` Ming Lei 2020-02-05 19:57 ` Salman Qazi 2020-02-06 10:18 ` Ming Lei 2020-02-06 21:12 ` Salman Qazi 2020-02-07 2:07 ` Ming Lei 2020-02-07 15:26 ` Bart Van Assche 2020-02-07 18:45 ` Salman Qazi 2020-02-07 19:04 ` Salman Qazi 2020-02-07 20:19 ` Bart Van Assche 2020-02-07 20:37 ` Salman Qazi 2020-04-20 16:42 ` Doug Anderson 2020-04-23 20:13 ` Jesse Barnes 2020-04-23 20:34 ` Jens Axboe 2020-04-23 20:40 ` Salman Qazi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.