[PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
@ 2017-04-03  1:18 NeilBrown
  2017-04-04  7:10 ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: NeilBrown @ 2017-04-03  1:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-mm, LKML

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..a7e1dd215fc2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
 {
 	struct loop_cmd *cmd =
 		container_of(work, struct loop_cmd, work);
+	int oldflags = current->flags & PF_LESS_THROTTLE;

+	current->flags |= PF_LESS_THROTTLE;
 	loop_handle_cmd(cmd);
+	current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
 }

 static int loop_init_request(void *data, struct request *rq,
-- 
2.12.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-03  1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
@ 2017-04-04  7:10 ` Christoph Hellwig
  2017-04-05  4:27   ` NeilBrown
  2017-04-04 11:23 ` Michal Hocko
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-04-04  7:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

But if you actually care about performance in any way I'd suggest
to use the loop device in direct I/O mode..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-04  7:10 ` Christoph Hellwig
@ 2017-04-05  4:27   ` NeilBrown
  2017-04-05  5:13     ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-04-05  4:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, linux-mm, LKML

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

On Tue, Apr 04 2017, Christoph Hellwig wrote:

> Looks fine,
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> But if you actually care about performance in any way I'd suggest
> to use the loop device in direct I/O mode..

The losetup on my test VM is too old to support that :-(
I guess it might be time to upgraded.

It seems that there is not "mount -o direct_loop" or similar, so you
have to do the losetup and the mount separately.  Any thoughts on
whether that should be changed ?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-05  4:27   ` NeilBrown
@ 2017-04-05  5:13     ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2017-04-05  5:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-mm, LKML

On Wed, Apr 5, 2017 at 12:27 PM, NeilBrown <neilb@suse.com> wrote:
> On Tue, Apr 04 2017, Christoph Hellwig wrote:
>
>> Looks fine,
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>
>> But if you actually care about performance in any way I'd suggest
>> to use the loop device in direct I/O mode..
>
> The losetup on my test VM is too old to support that :-(
> I guess it might be time to upgraded.
>
> It seems that there is not "mount -o direct_loop" or similar, so you
> have to do the losetup and the mount separately.  Any thoughts on

I guess the 'direct_loop' can be added into 'mount' directly? but not familiar
with mount utility.

> whether that should be changed ?

There was sysfs interface for controling direct IO in the initial
submission, but
looks it was reviewed out, :-)


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-03  1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
  2017-04-04  7:10 ` Christoph Hellwig
@ 2017-04-04 11:23 ` Michal Hocko
  2017-04-04 14:24 ` Ming Lei
  2017-04-05  4:33 ` [PATCH v2] " NeilBrown
  3 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-04-04 11:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML

On Mon 03-04-17 11:18:51, NeilBrown wrote:
> 
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
> 
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
> 
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
> 
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
> 
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.

Yes this makes sense to me

> Signed-off-by: NeilBrown <neilb@suse.com>

Acked-by: Michal Hocko <mhocko@suse.com>

one nit below

> ---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..a7e1dd215fc2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
>  {
>  	struct loop_cmd *cmd =
>  		container_of(work, struct loop_cmd, work);
> +	int oldflags = current->flags & PF_LESS_THROTTLE;
>  
> +	current->flags |= PF_LESS_THROTTLE;
>  	loop_handle_cmd(cmd);
> +	current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;

we have a helper for this tsk_restore_flags(). It is not used
consistently and maybe we want a dedicated api like we have for the
scope NOIO/NOFS but that is a separate thing. I would find
tsk_restore_flags easier to read.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-03  1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
  2017-04-04  7:10 ` Christoph Hellwig
  2017-04-04 11:23 ` Michal Hocko
@ 2017-04-04 14:24 ` Ming Lei
  2017-04-05  4:31   ` NeilBrown
  2017-04-05  4:33 ` [PATCH v2] " NeilBrown
  3 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2017-04-04 14:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML

On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..a7e1dd215fc2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
>  {
>         struct loop_cmd *cmd =
>                 container_of(work, struct loop_cmd, work);
> +       int oldflags = current->flags & PF_LESS_THROTTLE;
>
> +       current->flags |= PF_LESS_THROTTLE;
>         loop_handle_cmd(cmd);
> +       current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
>  }

You can do it against 'lo->worker_task' instead of doing it in each
loop_queue_work(),
and this flag needn't to be restored because the kernel thread is loop
specialized.


thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-04 14:24 ` Ming Lei
@ 2017-04-05  4:31   ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2017-04-05  4:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, linux-mm, LKML

[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]

On Tue, Apr 04 2017, Ming Lei wrote:

> On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote:
>>
>> When a filesystem is mounted from a loop device, writes are
>> throttled by balance_dirty_pages() twice: once when writing
>> to the filesystem and once when the loop_handle_cmd() writes
>> to the backing file.  This double-throttling can trigger
>> positive feedback loops that create significant delays.  The
>> throttling at the lower level is seen by the upper level as
>> a slow device, so it throttles extra hard.
>>
>> The PF_LESS_THROTTLE flag was created to handle exactly this
>> circumstance, though with an NFS filesystem mounted from a
>> local NFS server.  It reduces the throttling on the lower
>> layer so that it can proceed largely unthrottled.
>>
>> To demonstrate this, create a filesystem on a loop device
>> and write (e.g. with dd) several large files which combine
>> to consume significantly more than the limit set by
>> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
>> time taken.
>>
>> When I do this directly on a device (no loop device) the
>> total time for several runs (mkfs, mount, write 200 files,
>> umount) is fairly stable: 28-35 seconds.
>> When I do this over a loop device the times are much worse
>> and less stable.  52-460 seconds.  Half below 100seconds,
>> half above.
>> When I apply this patch, the times become stable again,
>> though not as fast as the no-loop-back case: 53-72 seconds.
>>
>> There may be room for further improvement as the total overhead still
>> seems too high, but this is a big improvement.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/block/loop.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> index 0ecb6461ed81..a7e1dd215fc2 100644
>> --- a/drivers/block/loop.c
>> +++ b/drivers/block/loop.c
>> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
>>  {
>>         struct loop_cmd *cmd =
>>                 container_of(work, struct loop_cmd, work);
>> +       int oldflags = current->flags & PF_LESS_THROTTLE;
>>
>> +       current->flags |= PF_LESS_THROTTLE;
>>         loop_handle_cmd(cmd);
>> +       current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
>>  }
>
> You can do it against 'lo->worker_task' instead of doing it in each
> loop_queue_work(),
> and this flag needn't to be restored because the kernel thread is loop
> specialized.
>

good point.  I'll do that.  Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-03  1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
                   ` (2 preceding siblings ...)
  2017-04-04 14:24 ` Ming Lei
@ 2017-04-05  4:33 ` NeilBrown
  2017-04-05  5:05   ` Ming Lei
  2017-04-05  7:19   ` Michal Hocko
  3 siblings, 2 replies; 14+ messages in thread
From: NeilBrown @ 2017-04-05  4:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-mm, LKML, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---

I moved where the flag is set, thanks to suggestion from
Ming Lei.
I've preserved the *-by: tags I was offered despite the code
being different, as the concept is identical.

Thanks,
NeilBrown

 drivers/block/loop.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..44b3506fd086 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;
 	set_user_nice(lo->worker_task, MIN_NICE);
+	lo->worker_task->flags |= PF_LESS_THROTTLE;
 	return 0;
 }

-- 
2.12.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-05  4:33 ` [PATCH v2] " NeilBrown
@ 2017-04-05  5:05   ` Ming Lei
  2017-04-05  7:19   ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Ming Lei @ 2017-04-05  5:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML

On Wed, Apr 5, 2017 at 12:33 PM, NeilBrown <neilb@suse.com> wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: NeilBrown <neilb@suse.com>

Reviewed-by: Ming Lei <tom.leiming@gmail.com>

> ---
>
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
>
> Thanks,
> NeilBrown
>
>
>  drivers/block/loop.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>         if (IS_ERR(lo->worker_task))
>                 return -ENOMEM;
>         set_user_nice(lo->worker_task, MIN_NICE);
> +       lo->worker_task->flags |= PF_LESS_THROTTLE;
>         return 0;
>  }
>
> --
> 2.12.2
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-05  4:33 ` [PATCH v2] " NeilBrown
  2017-04-05  5:05   ` Ming Lei
@ 2017-04-05  7:19   ` Michal Hocko
  2017-04-05  7:32     ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-04-05  7:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei

On Wed 05-04-17 14:33:50, NeilBrown wrote:
> 
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file.  This double-throttling can trigger
> positive feedback loops that create significant delays.  The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
> 
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server.  It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
> 
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
> time taken.
> 
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable.  52-460 seconds.  Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
> 
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
> 
> Thanks,
> NeilBrown
> 
> 
>  drivers/block/loop.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>  	if (IS_ERR(lo->worker_task))
>  		return -ENOMEM;
>  	set_user_nice(lo->worker_task, MIN_NICE);
> +	lo->worker_task->flags |= PF_LESS_THROTTLE;
>  	return 0;

As mentioned elsewhere, PF flags should be updated only on the current
task otherwise there is potential rmw race. Is this safe? The code runs
concurrently with the worker thread.


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-05  7:19   ` Michal Hocko
@ 2017-04-05  7:32     ` Michal Hocko
  2017-04-06  2:23       ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-04-05  7:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei

On Wed 05-04-17 09:19:27, Michal Hocko wrote:
> On Wed 05-04-17 14:33:50, NeilBrown wrote:
[...]
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 0ecb6461ed81..44b3506fd086 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
> >  	if (IS_ERR(lo->worker_task))
> >  		return -ENOMEM;
> >  	set_user_nice(lo->worker_task, MIN_NICE);
> > +	lo->worker_task->flags |= PF_LESS_THROTTLE;
> >  	return 0;
> 
> As mentioned elsewhere, PF flags should be updated only on the current
> task otherwise there is potential rmw race. Is this safe? The code runs
> concurrently with the worker thread.

I believe you need something like this instead
---
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f347285c67ec..07b2a909e4fb 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
 	kthread_stop(lo->worker_task);
 }
 
+int loop_kthread_worker_fn(void *worker_ptr)
+{
+	current->flags |= PF_LESS_THROTTLE;
+	return kthread_worker_fn(worker_ptr);
+}
+
 static int loop_prepare_queue(struct loop_device *lo)
 {
 	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(kthread_worker_fn,
+	lo->worker_task = kthread_run(loop_kthread_worker_fn,
 			&lo->worker, "loop%d", lo->lo_number);
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-05  7:32     ` Michal Hocko
@ 2017-04-06  2:23       ` NeilBrown
  2017-04-06  6:53         ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-04-06  2:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 4074 bytes --]

On Wed, Apr 05 2017, Michal Hocko wrote:

> On Wed 05-04-17 09:19:27, Michal Hocko wrote:
>> On Wed 05-04-17 14:33:50, NeilBrown wrote:
> [...]
>> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> > index 0ecb6461ed81..44b3506fd086 100644
>> > --- a/drivers/block/loop.c
>> > +++ b/drivers/block/loop.c
>> > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>> >  	if (IS_ERR(lo->worker_task))
>> >  		return -ENOMEM;
>> >  	set_user_nice(lo->worker_task, MIN_NICE);
>> > +	lo->worker_task->flags |= PF_LESS_THROTTLE;
>> >  	return 0;
>> 
>> As mentioned elsewhere, PF flags should be updated only on the current
>> task otherwise there is potential rmw race. Is this safe? The code runs
>> concurrently with the worker thread.
>
> I believe you need something like this instead
> ---
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index f347285c67ec..07b2a909e4fb 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
>  	kthread_stop(lo->worker_task);
>  }
>  
> +int loop_kthread_worker_fn(void *worker_ptr)
> +{
> +	current->flags |= PF_LESS_THROTTLE;
> +	return kthread_worker_fn(worker_ptr);
> +}
> +
>  static int loop_prepare_queue(struct loop_device *lo)
>  {
>  	kthread_init_worker(&lo->worker);
> -	lo->worker_task = kthread_run(kthread_worker_fn,
> +	lo->worker_task = kthread_run(loop_kthread_worker_fn,
>  			&lo->worker, "loop%d", lo->lo_number);
>  	if (IS_ERR(lo->worker_task))
>  		return -ENOMEM;

Arg - of course.
How about we just split the kthread_create from the wake_up?

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Subject: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/loop.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..95679d988725 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo)
 static int loop_prepare_queue(struct loop_device *lo)
 {
 	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(kthread_worker_fn,
+	lo->worker_task = kthread_create(kthread_worker_fn,
 			&lo->worker, "loop%d", lo->lo_number);
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;
+	lo->worker_task->flags |= PF_LESS_THROTTLE;
+	wake_up_process(lo->worker_task);
 	set_user_nice(lo->worker_task, MIN_NICE);
 	return 0;
 }
-- 
2.12.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-06  2:23       ` NeilBrown
@ 2017-04-06  6:53         ` Michal Hocko
  2017-04-06 23:47           ` [PATCH v3] " NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-04-06  6:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei

On Thu 06-04-17 12:23:51, NeilBrown wrote:
[...]
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..95679d988725 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo)
>  static int loop_prepare_queue(struct loop_device *lo)
>  {
>  	kthread_init_worker(&lo->worker);
> -	lo->worker_task = kthread_run(kthread_worker_fn,
> +	lo->worker_task = kthread_create(kthread_worker_fn,
>  			&lo->worker, "loop%d", lo->lo_number);
>  	if (IS_ERR(lo->worker_task))
>  		return -ENOMEM;
> +	lo->worker_task->flags |= PF_LESS_THROTTLE;
> +	wake_up_process(lo->worker_task);
>  	set_user_nice(lo->worker_task, MIN_NICE);
>  	return 0;

This should work for the current implementation because kthread_create
will return only after the full initialization has been done. No idea
whether we can rely on that in future. I also think it would be cleaner
to set the flag on current and keep the current semantic that only
current changes its flags.

So while I do not have a strong opinion on this I think defining loop
specific thread function which set PF_LESS_THROTTLE as the first thing
is more elegant and less error prone longerm. A short comment explaining
why we use the flag there would be also preferred.

I will leave the decision to you.

Thanks.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-04-06  6:53         ` Michal Hocko
@ 2017-04-06 23:47           ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2017-04-06 23:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michal Hocko, linux-block, linux-mm, LKML, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---

Hi Jens,
 I think this version meets with everyone's approval.

Thanks,
NeilBrown

 drivers/block/loop.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..035b8651b8bf 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
 	kthread_stop(lo->worker_task);
 }

+static int loop_kthread_worker_fn(void *worker_ptr)
+{
+	current->flags |= PF_LESS_THROTTLE;
+	return kthread_worker_fn(worker_ptr);
+}
+
 static int loop_prepare_queue(struct loop_device *lo)
 {
 	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(kthread_worker_fn,
+	lo->worker_task = kthread_run(loop_kthread_worker_fn,
 			&lo->worker, "loop%d", lo->lo_number);
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;
-- 
2.12.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-04-06 23:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03  1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
2017-04-04  7:10 ` Christoph Hellwig
2017-04-05  4:27   ` NeilBrown
2017-04-05  5:13     ` Ming Lei
2017-04-04 11:23 ` Michal Hocko
2017-04-04 14:24 ` Ming Lei
2017-04-05  4:31   ` NeilBrown
2017-04-05  4:33 ` [PATCH v2] " NeilBrown
2017-04-05  5:05   ` Ming Lei
2017-04-05  7:19   ` Michal Hocko
2017-04-05  7:32     ` Michal Hocko
2017-04-06  2:23       ` NeilBrown
2017-04-06  6:53         ` Michal Hocko
2017-04-06 23:47           ` [PATCH v3] " NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).