* [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. @ 2017-04-03 1:18 NeilBrown 2017-04-04 7:10 ` Christoph Hellwig ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: NeilBrown @ 2017-04-03 1:18 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-mm, LKML [-- Attachment #1: Type: text/plain, Size: 2128 bytes --] When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Signed-off-by: NeilBrown <neilb@suse.com> --- drivers/block/loop.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0ecb6461ed81..a7e1dd215fc2 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) { struct loop_cmd *cmd = container_of(work, struct loop_cmd, work); + int oldflags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; loop_handle_cmd(cmd); + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; } static int loop_init_request(void *data, struct request *rq, -- 2.12.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-03 1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown @ 2017-04-04 7:10 ` Christoph Hellwig 2017-04-05 4:27 ` NeilBrown 2017-04-04 11:23 ` Michal Hocko ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2017-04-04 7:10 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML Looks fine, Reviewed-by: Christoph Hellwig <hch@lst.de> But if you actually care about performance in any way I'd suggest to use the loop device in direct I/O mode.. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-04 7:10 ` Christoph Hellwig @ 2017-04-05 4:27 ` NeilBrown 2017-04-05 5:13 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2017-04-05 4:27 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, linux-mm, LKML [-- Attachment #1: Type: text/plain, Size: 509 bytes --] On Tue, Apr 04 2017, Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > But if you actually care about performance in any way I'd suggest > to use the loop device in direct I/O mode.. The losetup on my test VM is too old to support that :-( I guess it might be time to upgraded. It seems that there is not "mount -o direct_loop" or similar, so you have to do the losetup and the mount separately. Any thoughts on whether that should be changed ? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-05 4:27 ` NeilBrown @ 2017-04-05 5:13 ` Ming Lei 0 siblings, 0 replies; 14+ messages in thread From: Ming Lei @ 2017-04-05 5:13 UTC (permalink / raw) To: NeilBrown; +Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-mm, LKML On Wed, Apr 5, 2017 at 12:27 PM, NeilBrown <neilb@suse.com> wrote: > On Tue, Apr 04 2017, Christoph Hellwig wrote: > >> Looks fine, >> >> Reviewed-by: Christoph Hellwig <hch@lst.de> >> >> But if you actually care about performance in any way I'd suggest >> to use the loop device in direct I/O mode.. > > The losetup on my test VM is too old to support that :-( > I guess it might be time to upgraded. > > It seems that there is not "mount -o direct_loop" or similar, so you > have to do the losetup and the mount separately. Any thoughts on I guess the 'direct_loop' can be added into 'mount' directly? but not familiar with mount utility. > whether that should be changed ? There was sysfs interface for controling direct IO in the initial submission, but looks it was reviewed out, :-) Thanks, Ming Lei ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-03 1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown 2017-04-04 7:10 ` Christoph Hellwig @ 2017-04-04 11:23 ` Michal Hocko 2017-04-04 14:24 ` Ming Lei 2017-04-05 4:33 ` [PATCH v2] " NeilBrown 3 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2017-04-04 11:23 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML On Mon 03-04-17 11:18:51, NeilBrown wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. Yes this makes sense to me > Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> one nit below > --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..a7e1dd215fc2 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) > { > struct loop_cmd *cmd = > container_of(work, struct loop_cmd, work); > + int oldflags = current->flags & PF_LESS_THROTTLE; > > + current->flags |= PF_LESS_THROTTLE; > loop_handle_cmd(cmd); > + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; we have a helper for this tsk_restore_flags(). It is not used consistently and maybe we want a dedicated api like we have for the scope NOIO/NOFS but that is a separate thing. I would find tsk_restore_flags easier to read. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-03 1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown 2017-04-04 7:10 ` Christoph Hellwig 2017-04-04 11:23 ` Michal Hocko @ 2017-04-04 14:24 ` Ming Lei 2017-04-05 4:31 ` NeilBrown 2017-04-05 4:33 ` [PATCH v2] " NeilBrown 3 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2017-04-04 14:24 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..a7e1dd215fc2 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) > { > struct loop_cmd *cmd = > container_of(work, struct loop_cmd, work); > + int oldflags = current->flags & PF_LESS_THROTTLE; > > + current->flags |= PF_LESS_THROTTLE; > loop_handle_cmd(cmd); > + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; > } You can do it against 'lo->worker_task' instead of doing it in each loop_queue_work(), and this flag needn't to be restored because the kernel thread is loop specialized. thanks, Ming Lei ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-04 14:24 ` Ming Lei @ 2017-04-05 4:31 ` NeilBrown 0 siblings, 0 replies; 14+ messages in thread From: NeilBrown @ 2017-04-05 4:31 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, linux-mm, LKML [-- Attachment #1: Type: text/plain, Size: 2519 bytes --] On Tue, Apr 04 2017, Ming Lei wrote: > On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote: >> >> When a filesystem is mounted from a loop device, writes are >> throttled by balance_dirty_pages() twice: once when writing >> to the filesystem and once when the loop_handle_cmd() writes >> to the backing file. This double-throttling can trigger >> positive feedback loops that create significant delays. The >> throttling at the lower level is seen by the upper level as >> a slow device, so it throttles extra hard. >> >> The PF_LESS_THROTTLE flag was created to handle exactly this >> circumstance, though with an NFS filesystem mounted from a >> local NFS server. It reduces the throttling on the lower >> layer so that it can proceed largely unthrottled. >> >> To demonstrate this, create a filesystem on a loop device >> and write (e.g. with dd) several large files which combine >> to consume significantly more than the limit set by >> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total >> time taken. >> >> When I do this directly on a device (no loop device) the >> total time for several runs (mkfs, mount, write 200 files, >> umount) is fairly stable: 28-35 seconds. >> When I do this over a loop device the times are much worse >> and less stable. 52-460 seconds. Half below 100seconds, >> half above. >> When I apply this patch, the times become stable again, >> though not as fast as the no-loop-back case: 53-72 seconds. >> >> There may be room for further improvement as the total overhead still >> seems too high, but this is a big improvement. >> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> drivers/block/loop.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/block/loop.c b/drivers/block/loop.c >> index 0ecb6461ed81..a7e1dd215fc2 100644 >> --- a/drivers/block/loop.c >> +++ b/drivers/block/loop.c >> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) >> { >> struct loop_cmd *cmd = >> container_of(work, struct loop_cmd, work); >> + int oldflags = current->flags & PF_LESS_THROTTLE; >> >> + current->flags |= PF_LESS_THROTTLE; >> loop_handle_cmd(cmd); >> + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; >> } > > You can do it against 'lo->worker_task' instead of doing it in each > loop_queue_work(), > and this flag needn't to be restored because the kernel thread is loop > specialized. > good point. I'll do that. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-03 1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown ` (2 preceding siblings ...) 2017-04-04 14:24 ` Ming Lei @ 2017-04-05 4:33 ` NeilBrown 2017-04-05 5:05 ` Ming Lei 2017-04-05 7:19 ` Michal Hocko 3 siblings, 2 replies; 14+ messages in thread From: NeilBrown @ 2017-04-05 4:33 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-mm, LKML, Ming Lei [-- Attachment #1: Type: text/plain, Size: 2242 bytes --] When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> --- I moved where the flag is set, thanks to suggestion from Ming Lei. I've preserved the *-by: tags I was offered despite the code being different, as the concept is identical. Thanks, NeilBrown drivers/block/loop.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0ecb6461ed81..44b3506fd086 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo) if (IS_ERR(lo->worker_task)) return -ENOMEM; set_user_nice(lo->worker_task, MIN_NICE); + lo->worker_task->flags |= PF_LESS_THROTTLE; return 0; } -- 2.12.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-05 4:33 ` [PATCH v2] " NeilBrown @ 2017-04-05 5:05 ` Ming Lei 2017-04-05 7:19 ` Michal Hocko 1 sibling, 0 replies; 14+ messages in thread From: Ming Lei @ 2017-04-05 5:05 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML On Wed, Apr 5, 2017 at 12:33 PM, NeilBrown <neilb@suse.com> wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. > > Reviewed-by: Christoph Hellwig <hch@lst.de> > Acked-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> > --- > > I moved where the flag is set, thanks to suggestion from > Ming Lei. > I've preserved the *-by: tags I was offered despite the code > being different, as the concept is identical. > > Thanks, > NeilBrown > > > drivers/block/loop.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..44b3506fd086 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo) > if (IS_ERR(lo->worker_task)) > return -ENOMEM; > set_user_nice(lo->worker_task, MIN_NICE); > + lo->worker_task->flags |= PF_LESS_THROTTLE; > return 0; > } > > -- > 2.12.2 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-05 4:33 ` [PATCH v2] " NeilBrown 2017-04-05 5:05 ` Ming Lei @ 2017-04-05 7:19 ` Michal Hocko 2017-04-05 7:32 ` Michal Hocko 1 sibling, 1 reply; 14+ messages in thread From: Michal Hocko @ 2017-04-05 7:19 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei On Wed 05-04-17 14:33:50, NeilBrown wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. > > Reviewed-by: Christoph Hellwig <hch@lst.de> > Acked-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > > I moved where the flag is set, thanks to suggestion from > Ming Lei. > I've preserved the *-by: tags I was offered despite the code > being different, as the concept is identical. > > Thanks, > NeilBrown > > > drivers/block/loop.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..44b3506fd086 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo) > if (IS_ERR(lo->worker_task)) > return -ENOMEM; > set_user_nice(lo->worker_task, MIN_NICE); > + lo->worker_task->flags |= PF_LESS_THROTTLE; > return 0; As mentioned elsewhere, PF flags should be updated only on the current task otherwise there is potential rmw race. Is this safe? The code runs concurrently with the worker thread. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-05 7:19 ` Michal Hocko @ 2017-04-05 7:32 ` Michal Hocko 2017-04-06 2:23 ` NeilBrown 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2017-04-05 7:32 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei On Wed 05-04-17 09:19:27, Michal Hocko wrote: > On Wed 05-04-17 14:33:50, NeilBrown wrote: [...] > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 0ecb6461ed81..44b3506fd086 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo) > > if (IS_ERR(lo->worker_task)) > > return -ENOMEM; > > set_user_nice(lo->worker_task, MIN_NICE); > > + lo->worker_task->flags |= PF_LESS_THROTTLE; > > return 0; > > As mentioned elsewhere, PF flags should be updated only on the current > task otherwise there is potential rmw race. Is this safe? The code runs > concurrently with the worker thread. I believe you need something like this instead --- diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f347285c67ec..07b2a909e4fb 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo) kthread_stop(lo->worker_task); } +int loop_kthread_worker_fn(void *worker_ptr) +{ + current->flags |= PF_LESS_THROTTLE; + return kthread_worker_fn(worker_ptr); +} + static int loop_prepare_queue(struct loop_device *lo) { kthread_init_worker(&lo->worker); - lo->worker_task = kthread_run(kthread_worker_fn, + lo->worker_task = kthread_run(loop_kthread_worker_fn, &lo->worker, "loop%d", lo->lo_number); if (IS_ERR(lo->worker_task)) return -ENOMEM; -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-05 7:32 ` Michal Hocko @ 2017-04-06 2:23 ` NeilBrown 2017-04-06 6:53 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2017-04-06 2:23 UTC (permalink / raw) To: Michal Hocko; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei [-- Attachment #1: Type: text/plain, Size: 4074 bytes --] On Wed, Apr 05 2017, Michal Hocko wrote: > On Wed 05-04-17 09:19:27, Michal Hocko wrote: >> On Wed 05-04-17 14:33:50, NeilBrown wrote: > [...] >> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c >> > index 0ecb6461ed81..44b3506fd086 100644 >> > --- a/drivers/block/loop.c >> > +++ b/drivers/block/loop.c >> > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo) >> > if (IS_ERR(lo->worker_task)) >> > return -ENOMEM; >> > set_user_nice(lo->worker_task, MIN_NICE); >> > + lo->worker_task->flags |= PF_LESS_THROTTLE; >> > return 0; >> >> As mentioned elsewhere, PF flags should be updated only on the current >> task otherwise there is potential rmw race. Is this safe? The code runs >> concurrently with the worker thread. > > I believe you need something like this instead > --- > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index f347285c67ec..07b2a909e4fb 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo) > kthread_stop(lo->worker_task); > } > > +int loop_kthread_worker_fn(void *worker_ptr) > +{ > + current->flags |= PF_LESS_THROTTLE; > + return kthread_worker_fn(worker_ptr); > +} > + > static int loop_prepare_queue(struct loop_device *lo) > { > kthread_init_worker(&lo->worker); > - lo->worker_task = kthread_run(kthread_worker_fn, > + lo->worker_task = kthread_run(loop_kthread_worker_fn, > &lo->worker, "loop%d", lo->lo_number); > if (IS_ERR(lo->worker_task)) > return -ENOMEM; Arg - of course. How about we just split the kthread_create from the wake_up? Thanks, NeilBrown From: NeilBrown <neilb@suse.com> Subject: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread. When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Signed-off-by: NeilBrown <neilb@suse.com> --- drivers/block/loop.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0ecb6461ed81..95679d988725 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo) static int loop_prepare_queue(struct loop_device *lo) { kthread_init_worker(&lo->worker); - lo->worker_task = kthread_run(kthread_worker_fn, + lo->worker_task = kthread_create(kthread_worker_fn, &lo->worker, "loop%d", lo->lo_number); if (IS_ERR(lo->worker_task)) return -ENOMEM; + lo->worker_task->flags |= PF_LESS_THROTTLE; + wake_up_process(lo->worker_task); set_user_nice(lo->worker_task, MIN_NICE); return 0; } -- 2.12.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-06 2:23 ` NeilBrown @ 2017-04-06 6:53 ` Michal Hocko 2017-04-06 23:47 ` [PATCH v3] " NeilBrown 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2017-04-06 6:53 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-mm, LKML, Ming Lei On Thu 06-04-17 12:23:51, NeilBrown wrote: [...] > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..95679d988725 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo) > static int loop_prepare_queue(struct loop_device *lo) > { > kthread_init_worker(&lo->worker); > - lo->worker_task = kthread_run(kthread_worker_fn, > + lo->worker_task = kthread_create(kthread_worker_fn, > &lo->worker, "loop%d", lo->lo_number); > if (IS_ERR(lo->worker_task)) > return -ENOMEM; > + lo->worker_task->flags |= PF_LESS_THROTTLE; > + wake_up_process(lo->worker_task); > set_user_nice(lo->worker_task, MIN_NICE); > return 0; This should work for the current implementation because kthread_create will return only after the full initialization has been done. No idea whether we can rely on that in future. I also think it would be cleaner to set the flag on current and keep the current semantic that only current changes its flags. So while I do not have a strong opinion on this I think defining loop specific thread function which set PF_LESS_THROTTLE as the first thing is more elegant and less error prone longerm. A short comment explaining why we use the flag there would be also preferred. I will leave the decision to you. Thanks. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3] loop: Add PF_LESS_THROTTLE to block/loop device thread. 2017-04-06 6:53 ` Michal Hocko @ 2017-04-06 23:47 ` NeilBrown 0 siblings, 0 replies; 14+ messages in thread From: NeilBrown @ 2017-04-06 23:47 UTC (permalink / raw) To: Jens Axboe; +Cc: Michal Hocko, linux-block, linux-mm, LKML, Ming Lei [-- Attachment #1: Type: text/plain, Size: 2574 bytes --] When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> --- Hi Jens, I think this version meets with everyone's approval. Thanks, NeilBrown drivers/block/loop.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0ecb6461ed81..035b8651b8bf 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo) kthread_stop(lo->worker_task); } +static int loop_kthread_worker_fn(void *worker_ptr) +{ + current->flags |= PF_LESS_THROTTLE; + return kthread_worker_fn(worker_ptr); +} + static int loop_prepare_queue(struct loop_device *lo) { kthread_init_worker(&lo->worker); - lo->worker_task = kthread_run(kthread_worker_fn, + lo->worker_task = kthread_run(loop_kthread_worker_fn, &lo->worker, "loop%d", lo->lo_number); if (IS_ERR(lo->worker_task)) return -ENOMEM; -- 2.12.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-04-06 23:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-03 1:18 [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown 2017-04-04 7:10 ` Christoph Hellwig 2017-04-05 4:27 ` NeilBrown 2017-04-05 5:13 ` Ming Lei 2017-04-04 11:23 ` Michal Hocko 2017-04-04 14:24 ` Ming Lei 2017-04-05 4:31 ` NeilBrown 2017-04-05 4:33 ` [PATCH v2] " NeilBrown 2017-04-05 5:05 ` Ming Lei 2017-04-05 7:19 ` Michal Hocko 2017-04-05 7:32 ` Michal Hocko 2017-04-06 2:23 ` NeilBrown 2017-04-06 6:53 ` Michal Hocko 2017-04-06 23:47 ` [PATCH v3] " NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).