linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection
@ 2021-08-03 18:23 Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche

Hi Jens,

The two patches in this patch series are what I came up with while testing
Android software. Please consider these patches for inclusion in the upstream
kernel.

Thanks,

Bart.

Changes compared to v1:
- Introduced BLK_MQ_F_NO_SCHED_BY_DEFAULT and use it in the loop driver.
- Removed BLK_MQ_F_NO_SCHED again from the loop driver.

Bart Van Assche (3):
  blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
  loop: Select I/O scheduler 'none' from inside add_disk()
  loop: Add the default_queue_depth kernel module parameter

 block/elevator.c       | 3 +++
 drivers/block/loop.c   | 8 ++++++--
 include/linux/blk-mq.h | 6 ++++++
 3 files changed, 15 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag
  2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche
@ 2021-08-03 18:23 ` Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche
  2 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Ming Lei, Tetsuo Handa, Martijn Coenen

elevator_get_default() uses the following algorithm to select an I/O
scheduler from inside add_disk():
- In case of a single hardware queue or sharing hardware queues across
  multiple request queues (BLK_MQ_F_TAG_HCTX_SHARED), use mq-deadline.
- Otherwise, use 'none'.

This is a good choice for most but not for all block drivers. Make it
possible to override the selection of mq-deadline with a new flag,
namely BLK_MQ_F_NO_SCHED_BY_DEFAULT.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/elevator.c       | 3 +++
 include/linux/blk-mq.h | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/block/elevator.c b/block/elevator.c
index 52ada14cfe45..ae23da291297 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -630,6 +630,9 @@ static inline bool elv_support_iosched(struct request_queue *q)
  */
 static struct elevator_type *elevator_get_default(struct request_queue *q)
 {
+	if (q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT)
+		return NULL;
+
 	if (q->nr_hw_queues != 1 &&
 			!blk_mq_is_sbitmap_shared(q->tag_set->flags))
 		return NULL;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1d18447ebebc..22215db36122 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -404,7 +404,13 @@ enum {
 	BLK_MQ_F_STACKING	= 1 << 2,
 	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
+	/* Do not allow an I/O scheduler to be configured. */
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
+	/*
+	 * Select 'none' during queue registration in case of a single hwq
+	 * or shared hwqs instead of 'mq-deadline'.
+	 */
+	BLK_MQ_F_NO_SCHED_BY_DEFAULT	= 1 << 7,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche
@ 2021-08-03 18:23 ` Bart Van Assche
  2021-08-04  2:39   ` Ming Lei
  2021-08-04  5:35   ` Christoph Hellwig
  2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche
  2 siblings, 2 replies; 17+ messages in thread
From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Ming Lei, Tetsuo Handa, Martijn Coenen

We noticed that the user interface of Android devices becomes very slow
under memory pressure. This is because Android uses the zram driver on top
of the loop driver for swapping, because under memory pressure the swap
code alternates reads and writes quickly, because mq-deadline is the
default scheduler for loop devices and because mq-deadline delays writes by
five seconds for such a workload with default settings. Fix this by making
the kernel select I/O scheduler 'none' from inside add_disk() for loop
devices. This default can be overridden at any time from user space,
e.g. via a udev rule. This approach has an advantage compared to changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.

This patch reduces the Android boot time on my test setup with 0.5 seconds
compared to configuring the loop I/O scheduler from user space.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/loop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f8486d9b75a4..fa1c298a8cfb 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2333,7 +2333,8 @@ static int loop_add(int i)
 	lo->tag_set.queue_depth = 128;
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;
+	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING |
+		BLK_MQ_F_NO_SCHED_BY_DEFAULT;
 	lo->tag_set.driver_data = lo;
 
 	err = blk_mq_alloc_tag_set(&lo->tag_set);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter
  2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche
  2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche
@ 2021-08-03 18:23 ` Bart Van Assche
  2021-08-04  6:07   ` Greg KH
  2 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2021-08-03 18:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Ming Lei, Tetsuo Handa, Martijn Coenen

Recent versions of Android use the zram driver on top of the loop driver.
There is a mismatch between the default loop driver queue depth (128) and
the queue depth of the storage device in my test setup (32). That mismatch
results in write latencies that are higher than necessary. Address this
issue by making the default loop driver queue depth configurable. Compared
to configuring the queue depth by writing into the nr_requests sysfs
attribute, this approach does not involve calling synchronize_rcu() to
modify the queue depth.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/loop.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fa1c298a8cfb..b5dbf2d7447e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2098,6 +2098,9 @@ module_param(max_loop, int, 0444);
 MODULE_PARM_DESC(max_loop, "Maximum number of loop devices");
 module_param(max_part, int, 0444);
 MODULE_PARM_DESC(max_part, "Maximum number of partitions per loop device");
+static uint32_t default_queue_depth = 128;
+module_param(default_queue_depth, uint, 0644);
+MODULE_PARM_DESC(default_queue_depth, "Default loop device queue depth");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR);
 
@@ -2330,7 +2333,7 @@ static int loop_add(int i)
 	err = -ENOMEM;
 	lo->tag_set.ops = &loop_mq_ops;
 	lo->tag_set.nr_hw_queues = 1;
-	lo->tag_set.queue_depth = 128;
+	lo->tag_set.queue_depth = max(default_queue_depth, 2U);
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
 	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING |

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche
@ 2021-08-04  2:39   ` Ming Lei
  2021-08-04 17:43     ` Bart Van Assche
  2021-08-04 17:58     ` Jens Axboe
  2021-08-04  5:35   ` Christoph Hellwig
  1 sibling, 2 replies; 17+ messages in thread
From: Ming Lei @ 2021-08-04  2:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Tetsuo Handa, Martijn Coenen

On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
> We noticed that the user interface of Android devices becomes very slow
> under memory pressure. This is because Android uses the zram driver on top
> of the loop driver for swapping, because under memory pressure the swap
> code alternates reads and writes quickly, because mq-deadline is the
> default scheduler for loop devices and because mq-deadline delays writes by

Maybe we can bypass io scheduler always for request with REQ_SWAP, such as:

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 0f006cabfd91..d86709ac9d1f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	 * passthrough request is added to scheduler queue, there isn't any
 	 * chance to dispatch it given we prioritize requests in hctx->dispatch.
 	 */
-	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
+	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) ||
+			blk_rq_is_swap(rq))
 		return true;
 
 	return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d3afea47ade6..71aaa99614ab 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq)
 	return blk_op_is_passthrough(req_op(rq));
 }
 
+static inline bool blk_rq_is_swap(struct request *rq)
+{
+	return rq->cmd_flags & REQ_SWAP;
+}
+
 static inline unsigned short req_get_ioprio(struct request *req)
 {
 	return req->ioprio;


Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche
  2021-08-04  2:39   ` Ming Lei
@ 2021-08-04  5:35   ` Christoph Hellwig
  2021-08-04 17:58     ` Bart Van Assche
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-04  5:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen

On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
> We noticed that the user interface of Android devices becomes very slow
> under memory pressure. This is because Android uses the zram driver on top
> of the loop driver for swapping,

Sorry, but that is just amazingly stupid.  If you really want to swap
to compressed files introduce that support in the swap code instead of
coming up with dumb driver stacks from hell.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter
  2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche
@ 2021-08-04  6:07   ` Greg KH
  2021-08-04 17:38     ` Bart Van Assche
  2021-08-05  6:14     ` Chaitanya Kulkarni
  0 siblings, 2 replies; 17+ messages in thread
From: Greg KH @ 2021-08-04  6:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen

On Tue, Aug 03, 2021 at 11:23:04AM -0700, Bart Van Assche wrote:
> Recent versions of Android use the zram driver on top of the loop driver.
> There is a mismatch between the default loop driver queue depth (128) and
> the queue depth of the storage device in my test setup (32). That mismatch
> results in write latencies that are higher than necessary. Address this
> issue by making the default loop driver queue depth configurable. Compared
> to configuring the queue depth by writing into the nr_requests sysfs
> attribute, this approach does not involve calling synchronize_rcu() to
> modify the queue depth.
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Martijn Coenen <maco@android.com>
> Cc: Jaegeuk Kim <jaegeuk@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/block/loop.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index fa1c298a8cfb..b5dbf2d7447e 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -2098,6 +2098,9 @@ module_param(max_loop, int, 0444);
>  MODULE_PARM_DESC(max_loop, "Maximum number of loop devices");
>  module_param(max_part, int, 0444);
>  MODULE_PARM_DESC(max_part, "Maximum number of partitions per loop device");
> +static uint32_t default_queue_depth = 128;
> +module_param(default_queue_depth, uint, 0644);
> +MODULE_PARM_DESC(default_queue_depth, "Default loop device queue depth");
>  MODULE_LICENSE("GPL");
>  MODULE_ALIAS_BLOCKDEV_MAJOR(LOOP_MAJOR);

Please no, this is not the 1990's, we do not need module-wide options
like this that are hard to ever remove.  Worst case, this should be a
per-device option so it needs to be controllable that way.

But really, why can this not just be "tuned" on the fly?  How would
anyone know what to set this value to?  Should we just bump it up anyway
given that modern memory limits are probably more now?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter
  2021-08-04  6:07   ` Greg KH
@ 2021-08-04 17:38     ` Bart Van Assche
  2021-08-04 19:33       ` Greg KH
  2021-08-05  6:14     ` Chaitanya Kulkarni
  1 sibling, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2021-08-04 17:38 UTC (permalink / raw)
  To: Greg KH
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen

On 8/3/21 11:07 PM, Greg KH wrote:
> Should we just bump it up anyway given that modern memory limits are
> probably more now?

As mentioned in the patch description, this patch is not about saving 
memory but about reducing latency. Reducing the queue depth to control 
latency is a common approach for block storage. See e.g. the description 
of CONFIG_BLK_WBT in block/Kconfig.

You may want to know that I looked into reducing the queue depth from 
user space before I came up with this kernel patch 
(https://android-review.googlesource.com/c/platform/system/core/+/1783847/6). 
However, making that approach work requires significant changes in the 
Android SELinux sysfs labeling policy. From my point of view, modifying 
the kernel is much less painful than reworking the Android SELinux 
labeling policy.

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04  2:39   ` Ming Lei
@ 2021-08-04 17:43     ` Bart Van Assche
  2021-08-05  3:26       ` Ming Lei
  2021-08-04 17:58     ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2021-08-04 17:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Tetsuo Handa, Martijn Coenen

On 8/3/21 7:39 PM, Ming Lei wrote:
> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
>> We noticed that the user interface of Android devices becomes very slow
>> under memory pressure. This is because Android uses the zram driver on top
>> of the loop driver for swapping, because under memory pressure the swap
>> code alternates reads and writes quickly, because mq-deadline is the
>> default scheduler for loop devices and because mq-deadline delays writes by
> 
> Maybe we can bypass io scheduler always for request with REQ_SWAP, such as:
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 0f006cabfd91..d86709ac9d1f 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>   	 * passthrough request is added to scheduler queue, there isn't any
>   	 * chance to dispatch it given we prioritize requests in hctx->dispatch.
>   	 */
> -	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> +	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) ||
> +			blk_rq_is_swap(rq))
>   		return true;
>   
>   	return false;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d3afea47ade6..71aaa99614ab 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq)
>   	return blk_op_is_passthrough(req_op(rq));
>   }
>   
> +static inline bool blk_rq_is_swap(struct request *rq)
> +{
> +	return rq->cmd_flags & REQ_SWAP;
> +}
> +
>   static inline unsigned short req_get_ioprio(struct request *req)
>   {
>   	return req->ioprio;

Hi Ming,

Thanks for having suggested an alternative. However, isn't the above a 
policy? Shouldn't a kernel provide mechanisms instead of policies?

Additionally, the above patch does not address all Android loop driver 
use cases. Reading APEX files is regular I/O and hence REQ_SWAP is not 
set while reading from APEX files.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04  5:35   ` Christoph Hellwig
@ 2021-08-04 17:58     ` Bart Van Assche
  2021-08-04 18:04       ` Jens Axboe
  2021-08-05  6:39       ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Bart Van Assche @ 2021-08-04 17:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei, Tetsuo Handa,
	Martijn Coenen

On 8/3/21 10:35 PM, Christoph Hellwig wrote:
> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
>> We noticed that the user interface of Android devices becomes very slow
>> under memory pressure. This is because Android uses the zram driver on top
>> of the loop driver for swapping,
> 
> Sorry, but that is just amazingly stupid.  If you really want to swap
> to compressed files introduce that support in the swap code instead of
> coming up with dumb driver stacks from hell.

Hi Christoph,

That's an interesting suggestion. We can look into adding compression 
support in the swap code.

Independent of the use case of this patch, is it acceptable to change 
the default I/O scheduler of loop devices from mq-deadline into none 
(patches 1 and 2 of this series)?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04  2:39   ` Ming Lei
  2021-08-04 17:43     ` Bart Van Assche
@ 2021-08-04 17:58     ` Jens Axboe
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2021-08-04 17:58 UTC (permalink / raw)
  To: Ming Lei, Bart Van Assche
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Tetsuo Handa,
	Martijn Coenen

On 8/3/21 8:39 PM, Ming Lei wrote:
> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
>> We noticed that the user interface of Android devices becomes very slow
>> under memory pressure. This is because Android uses the zram driver on top
>> of the loop driver for swapping, because under memory pressure the swap
>> code alternates reads and writes quickly, because mq-deadline is the
>> default scheduler for loop devices and because mq-deadline delays writes by
> 
> Maybe we can bypass io scheduler always for request with REQ_SWAP, such as:

I don't think that's a good idea at all, what happens if you have a mixed
swap in/out on a rotating drive?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04 17:58     ` Bart Van Assche
@ 2021-08-04 18:04       ` Jens Axboe
  2021-08-05  6:39       ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2021-08-04 18:04 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: linux-block, Jaegeuk Kim, Ming Lei, Tetsuo Handa, Martijn Coenen

On 8/4/21 11:58 AM, Bart Van Assche wrote:
> On 8/3/21 10:35 PM, Christoph Hellwig wrote:
>> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
>>> We noticed that the user interface of Android devices becomes very slow
>>> under memory pressure. This is because Android uses the zram driver on top
>>> of the loop driver for swapping,
>>
>> Sorry, but that is just amazingly stupid.  If you really want to swap
>> to compressed files introduce that support in the swap code instead of
>> coming up with dumb driver stacks from hell.
> 
> Hi Christoph,
> 
> That's an interesting suggestion. We can look into adding compression 
> support in the swap code.
> 
> Independent of the use case of this patch, is it acceptable to change 
> the default I/O scheduler of loop devices from mq-deadline into none 
> (patches 1 and 2 of this series)?

Probably does, there's always going to be underlying storage for it
anyway.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter
  2021-08-04 17:38     ` Bart Van Assche
@ 2021-08-04 19:33       ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2021-08-04 19:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen

On Wed, Aug 04, 2021 at 10:38:55AM -0700, Bart Van Assche wrote:
> On 8/3/21 11:07 PM, Greg KH wrote:
> > Should we just bump it up anyway given that modern memory limits are
> > probably more now?
> 
> As mentioned in the patch description, this patch is not about saving memory
> but about reducing latency. Reducing the queue depth to control latency is a
> common approach for block storage. See e.g. the description of
> CONFIG_BLK_WBT in block/Kconfig.
> 
> You may want to know that I looked into reducing the queue depth from user
> space before I came up with this kernel patch
> (https://android-review.googlesource.com/c/platform/system/core/+/1783847/6).
> However, making that approach work requires significant changes in the
> Android SELinux sysfs labeling policy. From my point of view, modifying the
> kernel is much less painful than reworking the Android SELinux labeling
> policy.

That's not fair, do not add additional complexity to the kernel, and a
global option like this, just because your userspace configuration is
complex.

The fact that this is a global change should make it a non-starter as
that is one of the main reasons module options should never be used.

Again, we have learned from past mistakes, let's not forget them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04 17:43     ` Bart Van Assche
@ 2021-08-05  3:26       ` Ming Lei
  2021-08-05 16:49         ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2021-08-05  3:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Tetsuo Handa, Martijn Coenen

On Wed, Aug 04, 2021 at 10:43:05AM -0700, Bart Van Assche wrote:
> On 8/3/21 7:39 PM, Ming Lei wrote:
> > On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
> > > We noticed that the user interface of Android devices becomes very slow
> > > under memory pressure. This is because Android uses the zram driver on top
> > > of the loop driver for swapping, because under memory pressure the swap
> > > code alternates reads and writes quickly, because mq-deadline is the
> > > default scheduler for loop devices and because mq-deadline delays writes by
> > 
> > Maybe we can bypass io scheduler always for request with REQ_SWAP, such as:
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 0f006cabfd91..d86709ac9d1f 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -420,7 +420,8 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> >   	 * passthrough request is added to scheduler queue, there isn't any
> >   	 * chance to dispatch it given we prioritize requests in hctx->dispatch.
> >   	 */
> > -	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> > +	if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq) ||
> > +			blk_rq_is_swap(rq))
> >   		return true;
> >   	return false;
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index d3afea47ade6..71aaa99614ab 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -251,6 +251,11 @@ static inline bool blk_rq_is_passthrough(struct request *rq)
> >   	return blk_op_is_passthrough(req_op(rq));
> >   }
> > +static inline bool blk_rq_is_swap(struct request *rq)
> > +{
> > +	return rq->cmd_flags & REQ_SWAP;
> > +}
> > +
> >   static inline unsigned short req_get_ioprio(struct request *req)
> >   {
> >   	return req->ioprio;
> 
> Hi Ming,
> 
> Thanks for having suggested an alternative. However, isn't the above a
> policy? Shouldn't a kernel provide mechanisms instead of policies?

REQ_SWAP means it is one IO written to swap device/file, so the system
is suffering memory pressure, then isn't it reasonable to bypass io
scheduler for all SWAP IOs?

The issue you described in comment log is neither loop specific, nor
mq-deadline, I think it isn't good to just workaround the issue for loop.

> 
> Additionally, the above patch does not address all Android loop driver use
> cases. Reading APEX files is regular I/O and hence REQ_SWAP is not set while
> reading from APEX files.

Is APEX file one swap file? If not, can you explain it a bit why you want to
switch to none when reading APEX file?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter
  2021-08-04  6:07   ` Greg KH
  2021-08-04 17:38     ` Bart Van Assche
@ 2021-08-05  6:14     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 17+ messages in thread
From: Chaitanya Kulkarni @ 2021-08-05  6:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen




On 8/3/2021 11:07 PM, Greg KH wrote:
> On Tue, Aug 03, 2021 at 11:23:04AM -0700, Bart Van Assche wrote:
>> Recent versions of Android use the zram driver on top of the loop driver.
>> There is a mismatch between the default loop driver queue depth (128) and
>> the queue depth of the storage device in my test setup (32). That mismatch
>> results in write latencies that are higher than necessary. Address this
>> issue by making the default loop driver queue depth configurable. Compared
>> to configuring the queue depth by writing into the nr_requests sysfs
>> attribute, this approach does not involve calling synchronize_rcu() to
>> modify the queue depth.
>>
>> Reviewed-by: Ming Lei<ming.lei@redhat.com>
>> Cc: Tetsuo Handa<penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Christoph Hellwig<hch@lst.de>
>> Cc: Martijn Coenen<maco@android.com>
>> Cc: Jaegeuk Kim<jaegeuk@kernel.org>
>> Signed-off-by: Bart Van Assche<bvanassche@acm.org>

If I remember correct I've sent patch based on the similar concept that 
is not entirely different than this one.


-- 
-ck

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-04 17:58     ` Bart Van Assche
  2021-08-04 18:04       ` Jens Axboe
@ 2021-08-05  6:39       ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-05  6:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Jaegeuk Kim,
	Ming Lei, Tetsuo Handa, Martijn Coenen

On Wed, Aug 04, 2021 at 10:58:01AM -0700, Bart Van Assche wrote:
> On 8/3/21 10:35 PM, Christoph Hellwig wrote:
>> On Tue, Aug 03, 2021 at 11:23:03AM -0700, Bart Van Assche wrote:
>>> We noticed that the user interface of Android devices becomes very slow
>>> under memory pressure. This is because Android uses the zram driver on top
>>> of the loop driver for swapping,
>>
>> Sorry, but that is just amazingly stupid.  If you really want to swap
>> to compressed files introduce that support in the swap code instead of
>> coming up with dumb driver stacks from hell.
>
> Hi Christoph,
>
> That's an interesting suggestion. We can look into adding compression 
> support in the swap code.

As a short term fix you might also just look into having a file
backing for the zram writeback mode, which should not need more than
100-200 lines of code.

> Independent of the use case of this patch, is it acceptable to change the 
> default I/O scheduler of loop devices from mq-deadline into none (patches 1 
> and 2 of this series)?

Yes, that might not be a bad idea in general.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk()
  2021-08-05  3:26       ` Ming Lei
@ 2021-08-05 16:49         ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2021-08-05 16:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Tetsuo Handa, Martijn Coenen

On 8/4/21 8:26 PM, Ming Lei wrote:
> On Wed, Aug 04, 2021 at 10:43:05AM -0700, Bart Van Assche wrote:
>> Thanks for having suggested an alternative. However, isn't the above a
>> policy? Shouldn't a kernel provide mechanisms instead of policies?
> 
> REQ_SWAP means it is one IO written to swap device/file, so the system
> is suffering memory pressure, then isn't it reasonable to bypass io
> scheduler for all SWAP IOs?

Hmm ... I'm not sure that approach is ideal when swapping to a hard disk.

>> Additionally, the above patch does not address all Android loop driver use
>> cases. Reading APEX files is regular I/O and hence REQ_SWAP is not set while
>> reading from APEX files.
> 
> Is APEX file one swap file? If not, can you explain it a bit why you want to
> switch to none when reading APEX file?

As far as I know Android uses the loop driver for two purposes:
- A zram instance is used as swap device and the zram instance is
   configured to use a loop device.
- apexd mounts APEX files read-only and uses the loop driver to access
   these files as a block device.

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-08-05 16:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03 18:23 [PATCH v2 0/3] Improve loop driver I/O scheduler and QD selection Bart Van Assche
2021-08-03 18:23 ` [PATCH v2 1/3] blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag Bart Van Assche
2021-08-03 18:23 ` [PATCH v2 2/3] loop: Select I/O scheduler 'none' from inside add_disk() Bart Van Assche
2021-08-04  2:39   ` Ming Lei
2021-08-04 17:43     ` Bart Van Assche
2021-08-05  3:26       ` Ming Lei
2021-08-05 16:49         ` Bart Van Assche
2021-08-04 17:58     ` Jens Axboe
2021-08-04  5:35   ` Christoph Hellwig
2021-08-04 17:58     ` Bart Van Assche
2021-08-04 18:04       ` Jens Axboe
2021-08-05  6:39       ` Christoph Hellwig
2021-08-03 18:23 ` [PATCH v2 3/3] loop: Add the default_queue_depth kernel module parameter Bart Van Assche
2021-08-04  6:07   ` Greg KH
2021-08-04 17:38     ` Bart Van Assche
2021-08-04 19:33       ` Greg KH
2021-08-05  6:14     ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).