[RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
@ 2014-09-17 11:39 Aaron Tomlin
  2014-09-17 18:22 ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-17 11:39 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viro, david, atomlin, oleg, bmr, jcastillo, mguzik, linux-kernel

Since do_sync_work() is a deferred function it can block indefinitely by
design. At present do_sync_work() is added to the global system_wq.
As such a deadlock is theoretically possible between sys_unmount() and
sync_filesystems():

  * The current work fn on the system_wq (do_sync_work()) is blocked
    waiting to aquire a sb's s_umount for reading.

  * The "umount" task is the current owner of the s_umount in
    question but is waiting for do_sync_work() to continue.
    Thus we hit a deadlock situation.

This patch introduces a separate workqueue for do_sync_work() to avoid a
the described deadlock.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/sync.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index bdc729d..df455d0 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -15,6 +15,7 @@
 #include <linux/pagemap.h>
 #include <linux/quotaops.h>
 #include <linux/backing-dev.h>
+#include <linux/kthread.h>
 #include "internal.h"
 
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
@@ -114,7 +115,7 @@ SYSCALL_DEFINE0(sync)
 	return 0;
 }
 
-static void do_sync_work(struct work_struct *work)
+static int do_sync_work(void *dummy)
 {
 	int nowait = 0;
 
@@ -129,18 +130,12 @@ static void do_sync_work(struct work_struct *work)
 	iterate_supers(sync_fs_one_sb, &nowait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	printk("Emergency Sync complete\n");
-	kfree(work);
+	return 0;
 }
 
 void emergency_sync(void)
 {
-	struct work_struct *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		INIT_WORK(work, do_sync_work);
-		schedule_work(work);
-	}
+	kthread_run(do_sync_work, NULL, "sync_work_thread");
 }
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 11:39 [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock Aaron Tomlin
@ 2014-09-17 18:22 ` Oleg Nesterov
  2014-09-17 20:46   ` Aaron Tomlin
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2014-09-17 18:22 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel

On 09/17, Aaron Tomlin wrote:
>
> Since do_sync_work() is a deferred function it can block indefinitely by
> design. At present do_sync_work() is added to the global system_wq.
> As such a deadlock is theoretically possible between sys_unmount() and
> sync_filesystems():
>
>   * The current work fn on the system_wq (do_sync_work()) is blocked
>     waiting to aquire a sb's s_umount for reading.
>
>   * The "umount" task is the current owner of the s_umount in
>     question but is waiting for do_sync_work() to continue.
>     Thus we hit a deadlock situation.
>
I can't comment the patches in this area, but I am just curious...

Could you explain this deadlock in more details? I simply can't understand
what "waiting for do_sync_work()" actually means.

> This patch introduces a separate workqueue for do_sync_work() to avoid a
> the described deadlock.

The subject and the changelog do not match the patch, it doesn't add/use
another workqueue.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 18:22 ` Oleg Nesterov
@ 2014-09-17 20:46   ` Aaron Tomlin
  2014-09-17 21:16     ` Dave Chinner
  2014-09-17 21:42     ` Oleg Nesterov
  0 siblings, 2 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-17 20:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel

On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> On 09/17, Aaron Tomlin wrote:
> >
> > Since do_sync_work() is a deferred function it can block indefinitely by
> > design. At present do_sync_work() is added to the global system_wq.
> > As such a deadlock is theoretically possible between sys_unmount() and
> > sync_filesystems():
> >
> >   * The current work fn on the system_wq (do_sync_work()) is blocked
> >     waiting to aquire a sb's s_umount for reading.
> >
> >   * The "umount" task is the current owner of the s_umount in
> >     question but is waiting for do_sync_work() to continue.
> >     Thus we hit a deadlock situation.
> >
> I can't comment the patches in this area, but I am just curious...
> 
> Could you explain this deadlock in more details? I simply can't understand
> what "waiting for do_sync_work()" actually means.

Hopefully this helps:

	           "umount"                                      "events/1"

sys_umount					    sysrq_handle_sync
  deactivate_super(sb)				      emergency_sync
  {						    	schedule_work(work)
    ...						    	  queue_work(system_wq, work)
    down_write(&s->s_umount)			    	    do_sync_work(work)
    ...						      	      sync_filesystems(0)
    kill_block_super(s)				    		...
      generic_shutdown_super(sb)		    		down_read(&sb->s_umount)
      // sop->put_super(sb)
      ext4_put_super(sb)
	invalidate_bdev(sb->s_bdev)
	  lru_add_drain_all()
	    for_each_online_cpu(cpu) {
	      schedule_work_on(cpu, work)
		queue_work_on(cpu, system_wq, work)
		...
	    }
  }

  - Both lru_add_drain and do_sync_work work items are added to
    the same global system_wq

  - The current work fn on the system_wq is do_sync_work and is
    blocked waiting to aquire an sb's s_umount for reading

  - The umount task is the current owner of the s_umount in
    question but is waiting for do_sync_work to continue.
    Thus we hit a deadlock situation.


-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 20:46   ` Aaron Tomlin
@ 2014-09-17 21:16     ` Dave Chinner
  2014-09-19 15:44       ` Aaron Tomlin
  2014-09-17 21:42     ` Oleg Nesterov
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2014-09-17 21:16 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: Oleg Nesterov, linux-fsdevel, viro, bmr, jcastillo, mguzik, linux-kernel

On Wed, Sep 17, 2014 at 09:46:35PM +0100, Aaron Tomlin wrote:
> On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> > On 09/17, Aaron Tomlin wrote:
> > >
> > > Since do_sync_work() is a deferred function it can block indefinitely by
> > > design. At present do_sync_work() is added to the global system_wq.
> > > As such a deadlock is theoretically possible between sys_unmount() and
> > > sync_filesystems():
> > >
> > >   * The current work fn on the system_wq (do_sync_work()) is blocked
> > >     waiting to aquire a sb's s_umount for reading.
> > >
> > >   * The "umount" task is the current owner of the s_umount in
> > >     question but is waiting for do_sync_work() to continue.
> > >     Thus we hit a deadlock situation.
> > >
> > I can't comment the patches in this area, but I am just curious...
> > 
> > Could you explain this deadlock in more details? I simply can't understand
> > what "waiting for do_sync_work()" actually means.
> 
> Hopefully this helps:
> 
> 	           "umount"                                      "events/1"
> 
> sys_umount					    sysrq_handle_sync
>   deactivate_super(sb)				      emergency_sync
>   {						    	schedule_work(work)
>     ...						    	  queue_work(system_wq, work)
>     down_write(&s->s_umount)			    	    do_sync_work(work)
>     ...						      	      sync_filesystems(0)
>     kill_block_super(s)				    		...
>       generic_shutdown_super(sb)		    		down_read(&sb->s_umount)
>       // sop->put_super(sb)
>       ext4_put_super(sb)
> 	invalidate_bdev(sb->s_bdev)
> 	  lru_add_drain_all()
> 	    for_each_online_cpu(cpu) {
> 	      schedule_work_on(cpu, work)
> 		queue_work_on(cpu, system_wq, work)
> 		...
> 	    }
>   }
> 
>   - Both lru_add_drain and do_sync_work work items are added to
>     the same global system_wq
> 
>   - The current work fn on the system_wq is do_sync_work and is
>     blocked waiting to aquire an sb's s_umount for reading
> 
>   - The umount task is the current owner of the s_umount in
>     question but is waiting for do_sync_work to continue.
>     Thus we hit a deadlock situation.

What kernel did you see this deadlock on?

I don't see a deadlock here on a mainline kernel. The emergency sync
work blocks, the new work gets queued, and the workqueue
infrastructure simply pulls another kworker thread from the pool and
runs the new work. IOWs, I can't see how this would deadlock unless
the system_wq kworker pool has been fully depleted it's defined
per-cpu concurrency depth. If the kworker thread pool is depleted
then you have bigger problems than emergency sync not
deadlocking....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 20:46   ` Aaron Tomlin
  2014-09-17 21:16     ` Dave Chinner
@ 2014-09-17 21:42     ` Oleg Nesterov
  2014-09-19  9:35       ` Aaron Tomlin
  1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2014-09-17 21:42 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel

On 09/17, Aaron Tomlin wrote:
>
> On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> > On 09/17, Aaron Tomlin wrote:
> > >
> > > Since do_sync_work() is a deferred function it can block indefinitely by
> > > design. At present do_sync_work() is added to the global system_wq.
> > > As such a deadlock is theoretically possible between sys_unmount() and
> > > sync_filesystems():
> > >
> > >   * The current work fn on the system_wq (do_sync_work()) is blocked
> > >     waiting to aquire a sb's s_umount for reading.
> > >
> > >   * The "umount" task is the current owner of the s_umount in
> > >     question but is waiting for do_sync_work() to continue.
> > >     Thus we hit a deadlock situation.
> > >
> > I can't comment the patches in this area, but I am just curious...
> > 
> > Could you explain this deadlock in more details? I simply can't understand
> > what "waiting for do_sync_work()" actually means.
> 
> Hopefully this helps:
> 
> 	           "umount"                                      "events/1"
> 
> sys_umount					    sysrq_handle_sync
>   deactivate_super(sb)				      emergency_sync
>   {						    	schedule_work(work)
>     ...						    	  queue_work(system_wq, work)
>     down_write(&s->s_umount)			    	    do_sync_work(work)
>     ...						      	      sync_filesystems(0)
>     kill_block_super(s)				    		...
>       generic_shutdown_super(sb)		    		down_read(&sb->s_umount)
>       // sop->put_super(sb)
>       ext4_put_super(sb)
> 	invalidate_bdev(sb->s_bdev)
> 	  lru_add_drain_all()
> 	    for_each_online_cpu(cpu) {
> 	      schedule_work_on(cpu, work)
> 		queue_work_on(cpu, system_wq, work)
> 		...
> 	    }
>   }
> 
>   - Both lru_add_drain and do_sync_work work items are added to
>     the same global system_wq

Aha. Perhaps you hit this bug under the older kernel?

"same workqueue" doesn't mean "same worker thread" today, every CPU can
run up to ->max_active works. And for system_wq uses max_active = 256.

>   - The current work fn on the system_wq is do_sync_work and is
>     blocked waiting to aquire an sb's s_umount for reading

OK,

>   - The umount task is the current owner of the s_umount in
>     question but is waiting for do_sync_work to continue.
>     Thus we hit a deadlock situation.

I don't this this can happen, another worker threaf from worker_pool can
handle this work.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 21:42     ` Oleg Nesterov
@ 2014-09-19  9:35       ` Aaron Tomlin
  0 siblings, 0 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-19  9:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel

On Wed, Sep 17, 2014 at 11:42:09PM +0200, Oleg Nesterov wrote:
> > Hopefully this helps:
> > 
> > 	           "umount"                                      "events/1"
> > 
> > sys_umount					    sysrq_handle_sync
> >   deactivate_super(sb)			      emergency_sync
> >   {						    	schedule_work(work)
> >     ...						  queue_work(system_wq, work)
> >     down_write(&s->s_umount)			    do_sync_work(work)
> >     ...						      sync_filesystems(0)
> >     kill_block_super(s)				    	...
> >       generic_shutdown_super(sb)		    		down_read(&sb->s_umount)
> >       // sop->put_super(sb)
> >       ext4_put_super(sb)
> > 	invalidate_bdev(sb->s_bdev)
> > 	  lru_add_drain_all()
> > 	    for_each_online_cpu(cpu) {
> > 	      schedule_work_on(cpu, work)
> > 		queue_work_on(cpu, system_wq, work)
> > 		...
> > 	    }
> >   }
> > 
> >   - Both lru_add_drain and do_sync_work work items are added to
> >     the same global system_wq
> 
> Aha. Perhaps you hit this bug under the older kernel?

I did. Sorry for the noise.

> "same workqueue" doesn't mean "same worker thread" today, every CPU can
> run up to ->max_active works. And for system_wq uses max_active = 256.
> 
> >   - The current work fn on the system_wq is do_sync_work and is
> >     blocked waiting to aquire an sb's s_umount for reading
> 
> OK,
> 
> >   - The umount task is the current owner of the s_umount in
> >     question but is waiting for do_sync_work to continue.
> >     Thus we hit a deadlock situation.
> 
> I don't this this can happen, another worker threaf from worker_pool can
> handle this work.

Understood.

-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
  2014-09-17 21:16     ` Dave Chinner
@ 2014-09-19 15:44       ` Aaron Tomlin
  0 siblings, 0 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-19 15:44 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Oleg Nesterov, linux-fsdevel, viro, bmr, jcastillo, mguzik, linux-kernel

On Thu, Sep 18, 2014 at 07:16:13AM +1000, Dave Chinner wrote:
> >   - Both lru_add_drain and do_sync_work work items are added to
> >     the same global system_wq
> > 
> >   - The current work fn on the system_wq is do_sync_work and is
> >     blocked waiting to aquire an sb's s_umount for reading
> > 
> >   - The umount task is the current owner of the s_umount in
> >     question but is waiting for do_sync_work to continue.
> >     Thus we hit a deadlock situation.
> 
> What kernel did you see this deadlock on?

Sorry for the noise. This deadlock was produced under a kernel whereby
the workqueue implementation is significantly less sophisticated.

-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-19 15:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-17 11:39 [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock Aaron Tomlin
2014-09-17 18:22 ` Oleg Nesterov
2014-09-17 20:46   ` Aaron Tomlin
2014-09-17 21:16     ` Dave Chinner
2014-09-19 15:44       ` Aaron Tomlin
2014-09-17 21:42     ` Oleg Nesterov
2014-09-19  9:35       ` Aaron Tomlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.