* [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection @ 2021-08-03 18:23 Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche Hi Jens, The two patches in this patch series are what I came up with while testing Android software. Please consider these patches for inclusion in the upstream kernel. Thanks, Bart. Changes compared to v1: - Introduced BLK_MQ_F_NO_SCHED_BY_DEFAULT and use it in the loop driver. - Removed BLK_MQ_F_NO_SCHED again from the loop driver. Bart Van Assche (3): blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag loop: Select I/O scheduler 'none' from inside add_disk() loop: Add the default_queue_depth kernel module parameter block/elevator.c | 3 +++ drivers/block/loop.c | 8 ++++++-- include/linux/blk-mq.h | 6 ++++++ 3 files changed, 15 insertions(+), 2 deletions(-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag 2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche @ 2021-08-03 18:23 ` Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche 2 siblings, 0 replies; 17+ messages in thread From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche, Ming Lei, Tetsuo Handa, Martijn Coenen elevator_get_default() uses the following algorithm to select an I/O scheduler from inside add_disk(): - In case of a single hardware queue or sharing hardware queues across multiple request queues (BLK_MQ_F_TAG_HCTX_SHARED), use mq-deadline. - Otherwise, use 'none'. This is a good choice for most but not for all block drivers. Make it possible to override the selection of mq-deadline with a new flag, namely BLK_MQ_F_NO_SCHED_BY_DEFAULT. Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Hellwig <hch@lst.de> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- block/elevator.c | 3 +++ include/linux/blk-mq.h | 6 ++++++ 2 files changed, 9 insertions(+) diff --git a/block/elevator.c b/block/elevator.c index 52ada14cfe45..ae23da291297 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -630,6 +630,9 @@ static inline bool elv_support_iosched(struct request_queue *q) */ static struct elevator_type *elevator_get_default(struct request_queue *q) { + if (q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT) + return NULL; + if (q->nr_hw_queues != 1 && !blk_mq_is_sbitmap_shared(q->tag_set->flags)) return NULL; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 1d18447ebebc..22215db36122 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -404,7 +404,13 @@ enum { BLK_MQ_F_STACKING = 1 << 2, BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3, BLK_MQ_F_BLOCKING = 1 << 5, + /* Do not allow an I/O scheduler to be configured. */ BLK_MQ_F_NO_SCHED = 1 << 6, + /* + * Select 'none' during queue registration in case of a single hwq + * or shared hwqs instead of 'mq-deadline'. + */ + BLK_MQ_F_NO_SCHED_BY_DEFAULT = 1 << 7, BLK_MQ_F_ALLOC_POLICY_START_BIT = 8, BLK_MQ_F_ALLOC_POLICY_BITS = 1, ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche @ 2021-08-03 18:23 ` Bart Van Assche 2021-08-04 2:39 ` Ming Lei 2021-08-04 5:35 ` Christoph Hellwig 2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche 2 siblings, 2 replies; 17+ messages in thread From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche, Ming Lei, Tetsuo Handa, Martijn Coenen We noticed that the user interface of Android devices becomes very slow under memory pressure. This is because Android uses the zram driver on top of the loop driver for swapping, because under memory pressure the swap code alternates reads and writes quickly, because mq-deadline is the default scheduler for loop devices and because mq-deadline delays writes by five seconds for such a workload with default settings. Fix this by making the kernel select I/O scheduler 'none' from inside add_disk() for loop devices. This default can be overridden at any time from user space, e.g. via a udev rule. This approach has an advantage compared to changing the I/O scheduler from userspace from 'mq-deadline' into 'none', namely that synchronize_rcu() does not get called. This patch reduces the Android boot time on my test setup with 0.5 seconds compared to configuring the loop I/O scheduler from user space. Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Hellwig <hch@lst.de> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/block/loop.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f8486d9b75a4..fa1c298a8cfb 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2333,7 +2333,8 @@ static int loop_add(int i) lo->tag_set.queue_depth = 128; lo->tag_set.numa_node = NUMA_NO_NODE; lo->tag_set.cmd_size = sizeof(struct loop_cmd); - lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING; + lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING | + BLK_MQ_F_NO_SCHED_BY_DEFAULT; lo->tag_set.driver_data = lo; err = blk_mq_alloc_tag_set(&lo->tag_set); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche @ 2021-08-04 2:39 ` Ming Lei 2021-08-04 17:43 ` Bart Van Assche 2021-08-04 17:58 ` Jens Axboe 2021-08-04 5:35 ` Christoph Hellwig 1 sibling, 2 replies; 17+ messages in thread From: Ming Lei @ 2021-08-04 2:39 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa, Martijn Coenen On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: > We noticed that the user interface of Android devices becomes very slow > under memory pressure. This is because Android uses the zram driver on top > of the loop driver for swapping, because under memory pressure the swap > code alternates reads and writes quickly, because mq-deadline is the > default scheduler for loop devices and because mq-deadline delays writes by Maybe we can bypass io scheduler always for request with REQ_SWAP, such as: diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 0f006cabfd91..d86709ac9d1f 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, * passthrough request is added to scheduler queue, there isn't any * chance to dispatch it given we prioritize requests in hctx->dispatch. */ - if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) + if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) || + blk_rq_is_swap(rq)) return true; return false; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d3afea47ade6..71aaa99614ab 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq) return blk_op_is_passthrough(req_op(rq)); } +static inline bool blk_rq_is_swap(struct request *rq) +{ + return rq->cmd_flags & REQ_SWAP; +} + static inline unsigned short req_get_ioprio(struct request *req) { return req->ioprio; Thanks, Ming ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 2:39 ` Ming Lei @ 2021-08-04 17:43 ` Bart Van Assche 2021-08-05 3:26 ` Ming Lei 2021-08-04 17:58 ` Jens Axboe 1 sibling, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2021-08-04 17:43 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa, Martijn Coenen On 8/3/21 7:39 PM, Ming Lei wrote: > On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: >> We noticed that the user interface of Android devices becomes very slow >> under memory pressure. This is because Android uses the zram driver on top >> of the loop driver for swapping, because under memory pressure the swap >> code alternates reads and writes quickly, because mq-deadline is the >> default scheduler for loop devices and because mq-deadline delays writes by > > Maybe we can bypass io scheduler always for request with REQ_SWAP, such as: > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 0f006cabfd91..d86709ac9d1f 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, > * passthrough request is added to scheduler queue, there isn't any > * chance to dispatch it given we prioritize requests in hctx->dispatch. > */ > - if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) > + if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) || > + blk_rq_is_swap(rq)) > return true; > > return false; > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index d3afea47ade6..71aaa99614ab 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq) > return blk_op_is_passthrough(req_op(rq)); > } > > +static inline bool blk_rq_is_swap(struct request *rq) > +{ > + return rq->cmd_flags & REQ_SWAP; > +} > + > static inline unsigned short req_get_ioprio(struct request *req) > { > return req->ioprio; Hi Ming, Thanks for having suggested an alternative. However, isn't the above a policy? Shouldn't a kernel provide mechanisms instead of policies? Additionally, the above patch does not address all Android loop driver use cases. Reading APEX files is regular I/O and hence REQ_SWAP is not set while reading from APEX files. Thanks, Bart. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 17:43 ` Bart Van Assche @ 2021-08-05 3:26 ` Ming Lei 2021-08-05 16:49 ` Bart Van Assche 0 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2021-08-05 3:26 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa, Martijn Coenen On Wed, Aug 04, 2021 at 10:43:05AM -0700, Bart Van Assche wrote: > On 8/3/21 7:39 PM, Ming Lei wrote: > > On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: > > > We noticed that the user interface of Android devices becomes very slow > > > under memory pressure. This is because Android uses the zram driver on top > > > of the loop driver for swapping, because under memory pressure the swap > > > code alternates reads and writes quickly, because mq-deadline is the > > > default scheduler for loop devices and because mq-deadline delays writes by > > > > Maybe we can bypass io scheduler always for request with REQ_SWAP, such as: > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 0f006cabfd91..d86709ac9d1f 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, > > * passthrough request is added to scheduler queue, there isn't any > > * chance to dispatch it given we prioritize requests in hctx->dispatch. > > */ > > - if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) > > + if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) || > > + blk_rq_is_swap(rq)) > > return true; > > return false; > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index d3afea47ade6..71aaa99614ab 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq) > > return blk_op_is_passthrough(req_op(rq)); > > } > > +static inline bool blk_rq_is_swap(struct request *rq) > > +{ > > + return rq->cmd_flags & REQ_SWAP; > > +} > > + > > static inline unsigned short req_get_ioprio(struct request *req) > > { > > return req->ioprio; > > Hi Ming, > > Thanks for having suggested an alternative. However, isn't the above a > policy? Shouldn't a kernel provide mechanisms instead of policies? REQ_SWAP means it is one IO written to swap device/file, so the system is suffering memory pressure, then isn't it reasonable to bypass io scheduler for all SWAP IOs? The issue you described in comment log is neither loop specific, nor mq-deadline, I think it isn't good to just workaround the issue for loop. > > Additionally, the above patch does not address all Android loop driver use > cases. Reading APEX files is regular I/O and hence REQ_SWAP is not set while > reading from APEX files. Is APEX file one swap file? If not, can you explain it a bit why you want to switch to none when reading APEX file? Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-05 3:26 ` Ming Lei @ 2021-08-05 16:49 ` Bart Van Assche 0 siblings, 0 replies; 17+ messages in thread From: Bart Van Assche @ 2021-08-05 16:49 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa, Martijn Coenen On 8/4/21 8:26 PM, Ming Lei wrote: > On Wed, Aug 04, 2021 at 10:43:05AM -0700, Bart Van Assche wrote: >> Thanks for having suggested an alternative. However, isn't the above a >> policy? Shouldn't a kernel provide mechanisms instead of policies? > > REQ_SWAP means it is one IO written to swap device/file, so the system > is suffering memory pressure, then isn't it reasonable to bypass io > scheduler for all SWAP IOs? Hmm ... I'm not sure that approach is ideal when swapping to a hard disk. >> Additionally, the above patch does not address all Android loop driver use >> cases. Reading APEX files is regular I/O and hence REQ_SWAP is not set while >> reading from APEX files. > > Is APEX file one swap file? If not, can you explain it a bit why you want to > switch to none when reading APEX file? As far as I know Android uses the loop driver for two purposes: - A zram instance is used as swap device and the zram instance is configured to use a loop device. - apexd mounts APEX files read-only and uses the loop driver to access these files as a block device. Bart. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 2:39 ` Ming Lei 2021-08-04 17:43 ` Bart Van Assche @ 2021-08-04 17:58 ` Jens Axboe 1 sibling, 0 replies; 17+ messages in thread From: Jens Axboe @ 2021-08-04 17:58 UTC (permalink / raw) To: Ming Lei, Bart Van Assche Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa, Martijn Coenen On 8/3/21 8:39 PM, Ming Lei wrote: > On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: >> We noticed that the user interface of Android devices becomes very slow >> under memory pressure. This is because Android uses the zram driver on top >> of the loop driver for swapping, because under memory pressure the swap >> code alternates reads and writes quickly, because mq-deadline is the >> default scheduler for loop devices and because mq-deadline delays writes by > > Maybe we can bypass io scheduler always for request with REQ_SWAP, such as: I don't think that's a good idea at all, what happens if you have a mixed swap in/out on a rotating drive? -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche 2021-08-04 2:39 ` Ming Lei @ 2021-08-04 5:35 ` Christoph Hellwig 2021-08-04 17:58 ` Bart Van Assche 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2021-08-04 5:35 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: > We noticed that the user interface of Android devices becomes very slow > under memory pressure. This is because Android uses the zram driver on top > of the loop driver for swapping, Sorry, but that is just amazingly stupid. If you really want to swap to compressed files introduce that support in the swap code instead of coming up with dumb driver stacks from hell. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 5:35 ` Christoph Hellwig @ 2021-08-04 17:58 ` Bart Van Assche 2021-08-04 18:04 ` Jens Axboe 2021-08-05 6:39 ` Christoph Hellwig 0 siblings, 2 replies; 17+ messages in thread From: Bart Van Assche @ 2021-08-04 17:58 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On 8/3/21 10:35 PM, Christoph Hellwig wrote: > On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: >> We noticed that the user interface of Android devices becomes very slow >> under memory pressure. This is because Android uses the zram driver on top >> of the loop driver for swapping, > > Sorry, but that is just amazingly stupid. If you really want to swap > to compressed files introduce that support in the swap code instead of > coming up with dumb driver stacks from hell. Hi Christoph, That's an interesting suggestion. We can look into adding compression support in the swap code. Independent of the use case of this patch, is it acceptable to change the default I/O scheduler of loop devices from mq-deadline into none (patches 1 and 2 of this series)? Thanks, Bart. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 17:58 ` Bart Van Assche @ 2021-08-04 18:04 ` Jens Axboe 2021-08-05 6:39 ` Christoph Hellwig 1 sibling, 0 replies; 17+ messages in thread From: Jens Axboe @ 2021-08-04 18:04 UTC (permalink / raw) To: Bart Van Assche, Christoph Hellwig Cc: linux-block, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On 8/4/21 11:58 AM, Bart Van Assche wrote: > On 8/3/21 10:35 PM, Christoph Hellwig wrote: >> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: >>> We noticed that the user interface of Android devices becomes very slow >>> under memory pressure. This is because Android uses the zram driver on top >>> of the loop driver for swapping, >> >> Sorry, but that is just amazingly stupid. If you really want to swap >> to compressed files introduce that support in the swap code instead of >> coming up with dumb driver stacks from hell. > > Hi Christoph, > > That's an interesting suggestion. We can look into adding compression > support in the swap code. > > Independent of the use case of this patch, is it acceptable to change > the default I/O scheduler of loop devices from mq-deadline into none > (patches 1 and 2 of this series)? Probably does, there's always going to be underlying storage for it anyway. -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() 2021-08-04 17:58 ` Bart Van Assche 2021-08-04 18:04 ` Jens Axboe @ 2021-08-05 6:39 ` Christoph Hellwig 1 sibling, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2021-08-05 6:39 UTC (permalink / raw) To: Bart Van Assche Cc: Christoph Hellwig, Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On Wed, Aug 04, 2021 at 10:58:01AM -0700, Bart Van Assche wrote: > On 8/3/21 10:35 PM, Christoph Hellwig wrote: >> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote: >>> We noticed that the user interface of Android devices becomes very slow >>> under memory pressure. This is because Android uses the zram driver on top >>> of the loop driver for swapping, >> >> Sorry, but that is just amazingly stupid. If you really want to swap >> to compressed files introduce that support in the swap code instead of >> coming up with dumb driver stacks from hell. > > Hi Christoph, > > That's an interesting suggestion. We can look into adding compression > support in the swap code. As a short term fix you might also just look into having a file backing for the zram writeback mode, which should not need more than 100-200 lines of code. > Independent of the use case of this patch, is it acceptable to change the > default I/O scheduler of loop devices from mq-deadline into none (patches 1 > and 2 of this series)? Yes, that might not be a bad idea in general. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter 2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche @ 2021-08-03 18:23 ` Bart Van Assche 2021-08-04 6:07 ` Greg KH 2 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche, Ming Lei, Tetsuo Handa, Martijn Coenen Recent versions of Android use the zram driver on top of the loop driver. There is a mismatch between the default loop driver queue depth (128) and the queue depth of the storage device in my test setup (32). That mismatch results in write latencies that are higher than necessary. Address this issue by making the default loop driver queue depth configurable. Compared to configuring the queue depth by writing into the nr_requests sysfs attribute, this approach does not involve calling synchronize_rcu() to modify the queue depth. Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Hellwig <hch@lst.de> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/block/loop.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index fa1c298a8cfb..b5dbf2d7447e 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2098,6 +2098,9 @@ module_param(max_loop, int, 0444); MODULE_PARM_DESC(max_loop, "Maximum number of loop devices"); module_param(max_part, int, 0444); MODULE_PARM_DESC(max_part, "Maximum number of partitions per loop device"); +static uint32_t default_queue_depth = 128; +module_param(default_queue_depth, uint, 0644); +MODULE_PARM_DESC(default_queue_depth, "Default loop device queue depth"); MODULE_LICENSE("GPL"); MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR); @@ -2330,7 +2333,7 @@ static int loop_add(int i) err = -ENOMEM; lo->tag_set.ops = &loop_mq_ops; lo->tag_set.nr_hw_queues = 1; - lo->tag_set.queue_depth = 128; + lo->tag_set.queue_depth = max(default_queue_depth, 2U); lo->tag_set.numa_node = NUMA_NO_NODE; lo->tag_set.cmd_size = sizeof(struct loop_cmd); lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING | ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter 2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche @ 2021-08-04 6:07 ` Greg KH 2021-08-04 17:38 ` Bart Van Assche 2021-08-05 6:14 ` Chaitanya Kulkarni 0 siblings, 2 replies; 17+ messages in thread From: Greg KH @ 2021-08-04 6:07 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On Tue, Aug 03, 2021 at 11:23:04AM -0700, Bart Van Assche wrote: > Recent versions of Android use the zram driver on top of the loop driver. > There is a mismatch between the default loop driver queue depth (128) and > the queue depth of the storage device in my test setup (32). That mismatch > results in write latencies that are higher than necessary. Address this > issue by making the default loop driver queue depth configurable. Compared > to configuring the queue depth by writing into the nr_requests sysfs > attribute, this approach does not involve calling synchronize_rcu() to > modify the queue depth. > > Reviewed-by: Ming Lei <ming.lei@redhat.com> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Martijn Coenen <maco@android.com> > Cc: Jaegeuk Kim <jaegeuk@kernel.org> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/block/loop.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index fa1c298a8cfb..b5dbf2d7447e 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -2098,6 +2098,9 @@ module_param(max_loop, int, 0444); > MODULE_PARM_DESC(max_loop, "Maximum number of loop devices"); > module_param(max_part, int, 0444); > MODULE_PARM_DESC(max_part, "Maximum number of partitions per loop device"); > +static uint32_t default_queue_depth = 128; > +module_param(default_queue_depth, uint, 0644); > +MODULE_PARM_DESC(default_queue_depth, "Default loop device queue depth"); > MODULE_LICENSE("GPL"); > MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR); Please no, this is not the 1990's, we do not need module-wide options like this that are hard to ever remove. Worst case, this should be a per-device option so it needs to be controllable that way. But really, why can this not just be "tuned" on the fly? How would anyone know what to set this value to? Should we just bump it up anyway given that modern memory limits are probably more now? thanks, greg k-h ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter 2021-08-04 6:07 ` Greg KH @ 2021-08-04 17:38 ` Bart Van Assche 2021-08-04 19:33 ` Greg KH 2021-08-05 6:14 ` Chaitanya Kulkarni 1 sibling, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2021-08-04 17:38 UTC (permalink / raw) To: Greg KH Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On 8/3/21 11:07 PM, Greg KH wrote: > Should we just bump it up anyway given that modern memory limits are > probably more now? As mentioned in the patch description, this patch is not about saving memory but about reducing latency. Reducing the queue depth to control latency is a common approach for block storage. See e.g. the description of CONFIG_BLK_WBT in block/Kconfig. You may want to know that I looked into reducing the queue depth from user space before I came up with this kernel patch (https://android-review.googlesource.com/c/platform/system/core/+/1783847/6). However, making that approach work requires significant changes in the Android SELinux sysfs labeling policy. From my point of view, modifying the kernel is much less painful than reworking the Android SELinux labeling policy. Bart. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter 2021-08-04 17:38 ` Bart Van Assche @ 2021-08-04 19:33 ` Greg KH 0 siblings, 0 replies; 17+ messages in thread From: Greg KH @ 2021-08-04 19:33 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On Wed, Aug 04, 2021 at 10:38:55AM -0700, Bart Van Assche wrote: > On 8/3/21 11:07 PM, Greg KH wrote: > > Should we just bump it up anyway given that modern memory limits are > > probably more now? > > As mentioned in the patch description, this patch is not about saving memory > but about reducing latency. Reducing the queue depth to control latency is a > common approach for block storage. See e.g. the description of > CONFIG_BLK_WBT in block/Kconfig. > > You may want to know that I looked into reducing the queue depth from user > space before I came up with this kernel patch > (https://android-review.googlesource.com/c/platform/system/core/+/1783847/6). > However, making that approach work requires significant changes in the > Android SELinux sysfs labeling policy. From my point of view, modifying the > kernel is much less painful than reworking the Android SELinux labeling > policy. That's not fair, do not add additional complexity to the kernel, and a global option like this, just because your userspace configuration is complex. The fact that this is a global change should make it a non-starter as that is one of the main reasons module options should never be used. Again, we have learned from past mistakes, let's not forget them. thanks, greg k-h ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter 2021-08-04 6:07 ` Greg KH 2021-08-04 17:38 ` Bart Van Assche @ 2021-08-05 6:14 ` Chaitanya Kulkarni 1 sibling, 0 replies; 17+ messages in thread From: Chaitanya Kulkarni @ 2021-08-05 6:14 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen On 8/3/2021 11:07 PM, Greg KH wrote: > On Tue, Aug 03, 2021 at 11:23:04AM -0700, Bart Van Assche wrote: >> Recent versions of Android use the zram driver on top of the loop driver. >> There is a mismatch between the default loop driver queue depth (128) and >> the queue depth of the storage device in my test setup (32). That mismatch >> results in write latencies that are higher than necessary. Address this >> issue by making the default loop driver queue depth configurable. Compared >> to configuring the queue depth by writing into the nr_requests sysfs >> attribute, this approach does not involve calling synchronize_rcu() to >> modify the queue depth. >> >> Reviewed-by: Ming Lei<ming.lei@redhat.com> >> Cc: Tetsuo Handa<penguin-kernel@I-love.SAKURA.ne.jp> >> Cc: Christoph Hellwig<hch@lst.de> >> Cc: Martijn Coenen<maco@android.com> >> Cc: Jaegeuk Kim<jaegeuk@kernel.org> >> Signed-off-by: Bart Van Assche<bvanassche@acm.org> If I remember correct I've sent patch based on the similar concept that is not entirely different than this one. -- -ck ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-08-05 16:49 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche 2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche 2021-08-04 2:39 ` Ming Lei 2021-08-04 17:43 ` Bart Van Assche 2021-08-05 3:26 ` Ming Lei 2021-08-05 16:49 ` Bart Van Assche 2021-08-04 17:58 ` Jens Axboe 2021-08-04 5:35 ` Christoph Hellwig 2021-08-04 17:58 ` Bart Van Assche 2021-08-04 18:04 ` Jens Axboe 2021-08-05 6:39 ` Christoph Hellwig 2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche 2021-08-04 6:07 ` Greg KH 2021-08-04 17:38 ` Bart Van Assche 2021-08-04 19:33 ` Greg KH 2021-08-05 6:14 ` Chaitanya Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).