* [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
@ 2014-09-17 11:39 Aaron Tomlin
2014-09-17 18:22 ` Oleg Nesterov
0 siblings, 1 reply; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-17 11:39 UTC (permalink / raw)
To: linux-fsdevel
Cc: viro, david, atomlin, oleg, bmr, jcastillo, mguzik, linux-kernel
Since do_sync_work() is a deferred function it can block indefinitely by
design. At present do_sync_work() is added to the global system_wq.
As such a deadlock is theoretically possible between sys_unmount() and
sync_filesystems():
* The current work fn on the system_wq (do_sync_work()) is blocked
waiting to aquire a sb's s_umount for reading.
* The "umount" task is the current owner of the s_umount in
question but is waiting for do_sync_work() to continue.
Thus we hit a deadlock situation.
This patch introduces a separate workqueue for do_sync_work() to avoid a
the described deadlock.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Alexander Viro <viro@zeniv.linux.org.uk>
---
fs/sync.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/fs/sync.c b/fs/sync.c
index bdc729d..df455d0 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -15,6 +15,7 @@
#include <linux/pagemap.h>
#include <linux/quotaops.h>
#include <linux/backing-dev.h>
+#include <linux/kthread.h>
#include "internal.h"
#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
@@ -114,7 +115,7 @@ SYSCALL_DEFINE0(sync)
return 0;
}
-static void do_sync_work(struct work_struct *work)
+static int do_sync_work(void *dummy)
{
int nowait = 0;
@@ -129,18 +130,12 @@ static void do_sync_work(struct work_struct *work)
iterate_supers(sync_fs_one_sb, &nowait);
iterate_bdevs(fdatawrite_one_bdev, NULL);
printk("Emergency Sync complete\n");
- kfree(work);
+ return 0;
}
void emergency_sync(void)
{
- struct work_struct *work;
-
- work = kmalloc(sizeof(*work), GFP_ATOMIC);
- if (work) {
- INIT_WORK(work, do_sync_work);
- schedule_work(work);
- }
+ kthread_run(do_sync_work, NULL, "sync_work_thread");
}
/*
--
1.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 11:39 [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock Aaron Tomlin
@ 2014-09-17 18:22 ` Oleg Nesterov
2014-09-17 20:46 ` Aaron Tomlin
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2014-09-17 18:22 UTC (permalink / raw)
To: Aaron Tomlin
Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel
On 09/17, Aaron Tomlin wrote:
>
> Since do_sync_work() is a deferred function it can block indefinitely by
> design. At present do_sync_work() is added to the global system_wq.
> As such a deadlock is theoretically possible between sys_unmount() and
> sync_filesystems():
>
> * The current work fn on the system_wq (do_sync_work()) is blocked
> waiting to aquire a sb's s_umount for reading.
>
> * The "umount" task is the current owner of the s_umount in
> question but is waiting for do_sync_work() to continue.
> Thus we hit a deadlock situation.
>
I can't comment the patches in this area, but I am just curious...
Could you explain this deadlock in more details? I simply can't understand
what "waiting for do_sync_work()" actually means.
> This patch introduces a separate workqueue for do_sync_work() to avoid a
> the described deadlock.
The subject and the changelog do not match the patch, it doesn't add/use
another workqueue.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 18:22 ` Oleg Nesterov
@ 2014-09-17 20:46 ` Aaron Tomlin
2014-09-17 21:16 ` Dave Chinner
2014-09-17 21:42 ` Oleg Nesterov
0 siblings, 2 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-17 20:46 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel
On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> On 09/17, Aaron Tomlin wrote:
> >
> > Since do_sync_work() is a deferred function it can block indefinitely by
> > design. At present do_sync_work() is added to the global system_wq.
> > As such a deadlock is theoretically possible between sys_unmount() and
> > sync_filesystems():
> >
> > * The current work fn on the system_wq (do_sync_work()) is blocked
> > waiting to aquire a sb's s_umount for reading.
> >
> > * The "umount" task is the current owner of the s_umount in
> > question but is waiting for do_sync_work() to continue.
> > Thus we hit a deadlock situation.
> >
> I can't comment the patches in this area, but I am just curious...
>
> Could you explain this deadlock in more details? I simply can't understand
> what "waiting for do_sync_work()" actually means.
Hopefully this helps:
"umount" "events/1"
sys_umount sysrq_handle_sync
deactivate_super(sb) emergency_sync
{ schedule_work(work)
... queue_work(system_wq, work)
down_write(&s->s_umount) do_sync_work(work)
... sync_filesystems(0)
kill_block_super(s) ...
generic_shutdown_super(sb) down_read(&sb->s_umount)
// sop->put_super(sb)
ext4_put_super(sb)
invalidate_bdev(sb->s_bdev)
lru_add_drain_all()
for_each_online_cpu(cpu) {
schedule_work_on(cpu, work)
queue_work_on(cpu, system_wq, work)
...
}
}
- Both lru_add_drain and do_sync_work work items are added to
the same global system_wq
- The current work fn on the system_wq is do_sync_work and is
blocked waiting to aquire an sb's s_umount for reading
- The umount task is the current owner of the s_umount in
question but is waiting for do_sync_work to continue.
Thus we hit a deadlock situation.
--
Aaron Tomlin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 20:46 ` Aaron Tomlin
@ 2014-09-17 21:16 ` Dave Chinner
2014-09-19 15:44 ` Aaron Tomlin
2014-09-17 21:42 ` Oleg Nesterov
1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2014-09-17 21:16 UTC (permalink / raw)
To: Aaron Tomlin
Cc: Oleg Nesterov, linux-fsdevel, viro, bmr, jcastillo, mguzik, linux-kernel
On Wed, Sep 17, 2014 at 09:46:35PM +0100, Aaron Tomlin wrote:
> On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> > On 09/17, Aaron Tomlin wrote:
> > >
> > > Since do_sync_work() is a deferred function it can block indefinitely by
> > > design. At present do_sync_work() is added to the global system_wq.
> > > As such a deadlock is theoretically possible between sys_unmount() and
> > > sync_filesystems():
> > >
> > > * The current work fn on the system_wq (do_sync_work()) is blocked
> > > waiting to aquire a sb's s_umount for reading.
> > >
> > > * The "umount" task is the current owner of the s_umount in
> > > question but is waiting for do_sync_work() to continue.
> > > Thus we hit a deadlock situation.
> > >
> > I can't comment the patches in this area, but I am just curious...
> >
> > Could you explain this deadlock in more details? I simply can't understand
> > what "waiting for do_sync_work()" actually means.
>
> Hopefully this helps:
>
> "umount" "events/1"
>
> sys_umount sysrq_handle_sync
> deactivate_super(sb) emergency_sync
> { schedule_work(work)
> ... queue_work(system_wq, work)
> down_write(&s->s_umount) do_sync_work(work)
> ... sync_filesystems(0)
> kill_block_super(s) ...
> generic_shutdown_super(sb) down_read(&sb->s_umount)
> // sop->put_super(sb)
> ext4_put_super(sb)
> invalidate_bdev(sb->s_bdev)
> lru_add_drain_all()
> for_each_online_cpu(cpu) {
> schedule_work_on(cpu, work)
> queue_work_on(cpu, system_wq, work)
> ...
> }
> }
>
> - Both lru_add_drain and do_sync_work work items are added to
> the same global system_wq
>
> - The current work fn on the system_wq is do_sync_work and is
> blocked waiting to aquire an sb's s_umount for reading
>
> - The umount task is the current owner of the s_umount in
> question but is waiting for do_sync_work to continue.
> Thus we hit a deadlock situation.
What kernel did you see this deadlock on?
I don't see a deadlock here on a mainline kernel. The emergency sync
work blocks, the new work gets queued, and the workqueue
infrastructure simply pulls another kworker thread from the pool and
runs the new work. IOWs, I can't see how this would deadlock unless
the system_wq kworker pool has been fully depleted it's defined
per-cpu concurrency depth. If the kworker thread pool is depleted
then you have bigger problems than emergency sync not
deadlocking....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 20:46 ` Aaron Tomlin
2014-09-17 21:16 ` Dave Chinner
@ 2014-09-17 21:42 ` Oleg Nesterov
2014-09-19 9:35 ` Aaron Tomlin
1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2014-09-17 21:42 UTC (permalink / raw)
To: Aaron Tomlin
Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel
On 09/17, Aaron Tomlin wrote:
>
> On Wed, Sep 17, 2014 at 08:22:02PM +0200, Oleg Nesterov wrote:
> > On 09/17, Aaron Tomlin wrote:
> > >
> > > Since do_sync_work() is a deferred function it can block indefinitely by
> > > design. At present do_sync_work() is added to the global system_wq.
> > > As such a deadlock is theoretically possible between sys_unmount() and
> > > sync_filesystems():
> > >
> > > * The current work fn on the system_wq (do_sync_work()) is blocked
> > > waiting to aquire a sb's s_umount for reading.
> > >
> > > * The "umount" task is the current owner of the s_umount in
> > > question but is waiting for do_sync_work() to continue.
> > > Thus we hit a deadlock situation.
> > >
> > I can't comment the patches in this area, but I am just curious...
> >
> > Could you explain this deadlock in more details? I simply can't understand
> > what "waiting for do_sync_work()" actually means.
>
> Hopefully this helps:
>
> "umount" "events/1"
>
> sys_umount sysrq_handle_sync
> deactivate_super(sb) emergency_sync
> { schedule_work(work)
> ... queue_work(system_wq, work)
> down_write(&s->s_umount) do_sync_work(work)
> ... sync_filesystems(0)
> kill_block_super(s) ...
> generic_shutdown_super(sb) down_read(&sb->s_umount)
> // sop->put_super(sb)
> ext4_put_super(sb)
> invalidate_bdev(sb->s_bdev)
> lru_add_drain_all()
> for_each_online_cpu(cpu) {
> schedule_work_on(cpu, work)
> queue_work_on(cpu, system_wq, work)
> ...
> }
> }
>
> - Both lru_add_drain and do_sync_work work items are added to
> the same global system_wq
Aha. Perhaps you hit this bug under the older kernel?
"same workqueue" doesn't mean "same worker thread" today, every CPU can
run up to ->max_active works. And for system_wq uses max_active = 256.
> - The current work fn on the system_wq is do_sync_work and is
> blocked waiting to aquire an sb's s_umount for reading
OK,
> - The umount task is the current owner of the s_umount in
> question but is waiting for do_sync_work to continue.
> Thus we hit a deadlock situation.
I don't this this can happen, another worker threaf from worker_pool can
handle this work.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 21:42 ` Oleg Nesterov
@ 2014-09-19 9:35 ` Aaron Tomlin
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-19 9:35 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-fsdevel, viro, david, bmr, jcastillo, mguzik, linux-kernel
On Wed, Sep 17, 2014 at 11:42:09PM +0200, Oleg Nesterov wrote:
> > Hopefully this helps:
> >
> > "umount" "events/1"
> >
> > sys_umount sysrq_handle_sync
> > deactivate_super(sb) emergency_sync
> > { schedule_work(work)
> > ... queue_work(system_wq, work)
> > down_write(&s->s_umount) do_sync_work(work)
> > ... sync_filesystems(0)
> > kill_block_super(s) ...
> > generic_shutdown_super(sb) down_read(&sb->s_umount)
> > // sop->put_super(sb)
> > ext4_put_super(sb)
> > invalidate_bdev(sb->s_bdev)
> > lru_add_drain_all()
> > for_each_online_cpu(cpu) {
> > schedule_work_on(cpu, work)
> > queue_work_on(cpu, system_wq, work)
> > ...
> > }
> > }
> >
> > - Both lru_add_drain and do_sync_work work items are added to
> > the same global system_wq
>
> Aha. Perhaps you hit this bug under the older kernel?
I did. Sorry for the noise.
> "same workqueue" doesn't mean "same worker thread" today, every CPU can
> run up to ->max_active works. And for system_wq uses max_active = 256.
>
> > - The current work fn on the system_wq is do_sync_work and is
> > blocked waiting to aquire an sb's s_umount for reading
>
> OK,
>
> > - The umount task is the current owner of the s_umount in
> > question but is waiting for do_sync_work to continue.
> > Thus we hit a deadlock situation.
>
> I don't this this can happen, another worker threaf from worker_pool can
> handle this work.
Understood.
--
Aaron Tomlin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock
2014-09-17 21:16 ` Dave Chinner
@ 2014-09-19 15:44 ` Aaron Tomlin
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Tomlin @ 2014-09-19 15:44 UTC (permalink / raw)
To: Dave Chinner
Cc: Oleg Nesterov, linux-fsdevel, viro, bmr, jcastillo, mguzik, linux-kernel
On Thu, Sep 18, 2014 at 07:16:13AM +1000, Dave Chinner wrote:
> > - Both lru_add_drain and do_sync_work work items are added to
> > the same global system_wq
> >
> > - The current work fn on the system_wq is do_sync_work and is
> > blocked waiting to aquire an sb's s_umount for reading
> >
> > - The umount task is the current owner of the s_umount in
> > question but is waiting for do_sync_work to continue.
> > Thus we hit a deadlock situation.
>
> What kernel did you see this deadlock on?
Sorry for the noise. This deadlock was produced under a kernel whereby
the workqueue implementation is significantly less sophisticated.
--
Aaron Tomlin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-19 15:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-17 11:39 [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock Aaron Tomlin
2014-09-17 18:22 ` Oleg Nesterov
2014-09-17 20:46 ` Aaron Tomlin
2014-09-17 21:16 ` Dave Chinner
2014-09-19 15:44 ` Aaron Tomlin
2014-09-17 21:42 ` Oleg Nesterov
2014-09-19 9:35 ` Aaron Tomlin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.