All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Deadlock in fsnotify for
       [not found] <OFAA2B5164.CC89CD80-ON80258315.0053848D-80258315.005681B2@waters.com>
@ 2018-09-27 16:24 ` Jan Kara
       [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004AAC30@waters.com>
       [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004A4100@LocalDomain>
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Kara @ 2018-09-27 16:24 UTC (permalink / raw)
  To: Nigel Banks; +Cc: jack, linux-fsdevel, Amir Goldstein

Hello,

[added to CC other relevant mails]

On Thu 27-09-18 16:44:53, Nigel Banks wrote:
> Sorry to trouble you, but from looking through the git history of linux/fs/
> notify you seem to be the best person to contact.
> 
> I've encounter a hard to reproduce situation that happens on our CI
> servers, in which it becomes impossible to release any inotify file
> descriptors. We're currently running Ubuntu 18.04 (Kernel 4.15) using
> ext4 fs, and our code is running in docker containers (overlay2) if that
> makes a difference.
> 
> Essentially we're running a number of concurrent tests which internally
> use inotify to monitor some directories this all works fine and they
> clean up after themselves, but after several days there will be a
> deadlock in the kernel code (sys stack below):
> 
> [<0>] flush_work+0x126/0x1e0
> [<0>] flush_delayed_work+0x3f/0x50
> [<0>] fsnotify_wait_marks_destroyed+0x15/0x20
> [<0>] fsnotify_destroy_group+0x48/0xd0
> [<0>] inotify_release+0x1e/0x50
> [<0>] __fput+0xea/0x220
> [<0>] ____fput+0xe/0x10
> [<0>] task_work_run+0x9d/0xc0
> [<0>] exit_to_usermode_loop+0xc0/0xd0
> [<0>] do_syscall_64+0x115/0x130
> [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [<0>] 0xffffffffffffffff
 
Hum, I don't remember seeing any deadlock like this. When a system hangs
like this, can you please do:

echo w >/proc/sysrq-trigger

and send me the output of 'dmesg' command after that. In that output we
should see all hung tasks (including kernel threads) and their traces and
hopefully it will tell us more.

> Once a processes gets stuck in this uninterruptable sleep it will never wake.
> At this point the system is still usable, we're able to create more inotify
> instances and receive messages for them, but we are not able to close any of
> them. So eventually we run out of handles and the system becomes unstable, not
> to mention we can't run any more tests on the machine at this point, and a
> reboot is required.

Yes, this is expected. I looks like some deadlock in the fsnotify
subsystem.

> From my research, it looks like lxc project has also encountered this issue:
> https://github.com/lxc/lxc/issues/2456, like them we also didn't experience
> this behaviour with our previous set-up Ubuntu 16.04 (Kernel 14.04).
> 
> I had a look through the bug lists and through the commit history for linux/fs/
> notify and could not find this issue listed anywhere.
> 
> I've  attempted to write a small C program using pthreads and the inotify
> sys-calls, but was unable to create a program that could reproduce this issue.

Thanks for report.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in fsnotify for
       [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004AAC30@waters.com>
@ 2018-09-28 14:27     ` Amir Goldstein
       [not found]       ` <OFD621F3D0.4B122C7F-ON80258319.00395CDF-80258319.00399966@waters.com>
  2018-10-01 10:25     ` Jan Kara
  1 sibling, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2018-09-28 14:27 UTC (permalink / raw)
  To: Nigel_Banks; +Cc: Jan Kara, linux-fsdevel, Mark Salyzyn

On Fri, Sep 28, 2018 at 4:47 PM Nigel Banks <Nigel_Banks@waters.com> wrote:
>
> Hello Again,
>
> I've attached the kern.log as you instructed, please let me know if there is any more information I can provide.
>
>
>
> Cheers,
>
> Nigel
>
>
>
> From:        Jan Kara <jack@suse.cz>
> To:        Nigel Banks <Nigel_Banks@waters.com>
> Cc:        jack@suse.cz, linux-fsdevel@vger.kernel.org, Amir Goldstein <amir73il@gmail.com>
> Date:        09/27/2018 05:24 PM
> Subject:        Re: Deadlock in fsnotify for
> ________________________________
>
>
>
> Hello,
>
> [added to CC other relevant mails]
>
> On Thu 27-09-18 16:44:53, Nigel Banks wrote:
> > Sorry to trouble you, but from looking through the git history of linux/fs/
> > notify you seem to be the best person to contact.
> >
> > I've encounter a hard to reproduce situation that happens on our CI
> > servers, in which it becomes impossible to release any inotify file
> > descriptors. We're currently running Ubuntu 18.04 (Kernel 4.15) using
> > ext4 fs, and our code is running in docker containers (overlay2) if that
> > makes a difference.
> >

Nigel,

It actually could make a difference.
commit 764baba80168 ("ovl: hash non-dir by lower inode for fsnotify").
never made it to stable trees - it got stuck in the process:
https://www.spinics.net/lists/stable/msg250441.html

Can you check if Mark's backport patch (in the link above) solves your problem?

I have written an LTP test case to cover the fix - inotify08
and I can confirm that the test fails on Ubuntu 18.04, but the test does not
result in a hang, it results in events not being delivered, so not sure it is
related to the issue you are seeing. In any case, running inotofi over overlayfs
without this fix is not a good idea.

Mark,

Could you please re-post the backport patch per Greg's request.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in fsnotify for
       [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004AAC30@waters.com>
  2018-09-28 14:27     ` Amir Goldstein
@ 2018-10-01 10:25     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2018-10-01 10:25 UTC (permalink / raw)
  To: Nigel Banks; +Cc: Jan Kara, Amir Goldstein, linux-fsdevel

Hello,

On Fri 28-09-18 09:35:38, Nigel Banks wrote:
> I've attached the kern.log as you instructed, please let me know if there is
> any more information I can provide.

Thanks for the traces. So all processes but one hang like you've described
- i.e.:

 schedule+0x2c/0x80
 schedule_timeout+0x1cf/0x350
 ? sched_clock+0x9/0x10
 ? sched_clock+0x9/0x10
 ? sched_clock_cpu+0x11/0xb0
 wait_for_completion+0xba/0x140
 ? wake_up_q+0x80/0x80
 flush_work+0x126/0x1e0
 ? worker_detach_from_pool+0xa0/0xa0
 flush_delayed_work+0x3f/0x50
 fsnotify_wait_marks_destroyed+0x15/0x20
 fsnotify_destroy_group+0x48/0xd0
 inotify_release+0x1e/0x50
 __fput+0xea/0x220
 ____fput+0xe/0x10
 task_work_run+0x9d/0xc0

They all wait for worker thread to destroy marks. That is hung like:

 schedule+0x2c/0x80
 schedule_timeout+0x1cf/0x350
 ? select_idle_sibling+0x262/0x410
 ? __enqueue_entity+0x5c/0x60
 ? enqueue_entity+0x10e/0x6b0
 wait_for_completion+0xba/0x140
 ? wake_up_q+0x80/0x80
 __synchronize_srcu.part.13+0x85/0xb0
 ? trace_raw_output_rcu_utilization+0x50/0x50
 ? ttwu_do_activate+0x77/0x80
 synchronize_srcu+0x66/0xe0
 ? synchronize_srcu+0x66/0xe0
 fsnotify_mark_destroy_workfn+0x7b/0xe0
 process_one_work+0x1de/0x410
 worker_thread+0x228/0x410
 kthread+0x121/0x140

So it waits for SRCU period to end. From the traces it is not clear who
prevents the SRCU period from finishing. Did you include all tasks from the
trace in the attached file?

If yes, I have no good idea what could be holding the SRCU. Since your
kernel is 4.4 based which is relatively old and has some patches applied on
top, I'd suggest you either try newer kernel (e.g. you should be able to
install 4.16 or 4.17 relatively easily) or report this to Ubuntu bugzilla.

Thanks.

								Honza
> 
> 
> 
> Cheers,
> 
> Nigel
> 
> 
> 
> From:        Jan Kara <jack@suse.cz>
> To:        Nigel Banks <Nigel_Banks@waters.com>
> Cc:        jack@suse.cz, linux-fsdevel@vger.kernel.org, Amir Goldstein
> <amir73il@gmail.com>
> Date:        09/27/2018 05:24 PM
> Subject:        Re: Deadlock in fsnotify for
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> 
> 
> 
> Hello,
> 
> [added to CC other relevant mails]
> 
> On Thu 27-09-18 16:44:53, Nigel Banks wrote:
> > Sorry to trouble you, but from looking through the git history of linux/fs/
> > notify you seem to be the best person to contact.
> >
> > I've encounter a hard to reproduce situation that happens on our CI
> > servers, in which it becomes impossible to release any inotify file
> > descriptors. We're currently running Ubuntu 18.04 (Kernel 4.15) using
> > ext4 fs, and our code is running in docker containers (overlay2) if that
> > makes a difference.
> >
> > Essentially we're running a number of concurrent tests which internally
> > use inotify to monitor some directories this all works fine and they
> > clean up after themselves, but after several days there will be a
> > deadlock in the kernel code (sys stack below):
> >
> > [<0>] flush_work+0x126/0x1e0
> > [<0>] flush_delayed_work+0x3f/0x50
> > [<0>] fsnotify_wait_marks_destroyed+0x15/0x20
> > [<0>] fsnotify_destroy_group+0x48/0xd0
> > [<0>] inotify_release+0x1e/0x50
> > [<0>] __fput+0xea/0x220
> > [<0>] ____fput+0xe/0x10
> > [<0>] task_work_run+0x9d/0xc0
> > [<0>] exit_to_usermode_loop+0xc0/0xd0
> > [<0>] do_syscall_64+0x115/0x130
> > [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > [<0>] 0xffffffffffffffff
> 
> Hum, I don't remember seeing any deadlock like this. When a system hangs
> like this, can you please do:
> 
> echo w >/proc/sysrq-trigger
> 
> and send me the output of 'dmesg' command after that. In that output we
> should see all hung tasks (including kernel threads) and their traces and
> hopefully it will tell us more.
> 
> > Once a processes gets stuck in this uninterruptable sleep it will never wake.
> > At this point the system is still usable, we're able to create more inotify
> > instances and receive messages for them, but we are not able to close any of
> > them. So eventually we run out of handles and the system becomes unstable,
> not
> > to mention we can't run any more tests on the machine at this point, and a
> > reboot is required.
> 
> Yes, this is expected. I looks like some deadlock in the fsnotify
> subsystem.
> 
> > From my research, it looks like lxc project has also encountered this issue:
> > https://github.com/lxc/lxc/issues/2456, like them we also didn't experience
> > this behaviour with our previous set-up Ubuntu 16.04 (Kernel 14.04).
> >
> > I had a look through the bug lists and through the commit history for linux/
> fs/
> > notify and could not find this issue listed anywhere.
> >
> > I've  attempted to write a small C program using pthreads and the inotify
> > sys-calls, but was unable to create a program that could reproduce this
> issue.
> 
> Thanks for report.
> 
>                                                                                
>                                                        Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 
> 
> 
> =========================================================== The information in
> this email is confidential, and is intended solely for the addressee(s). Access
> to this email by anyone else is unauthorized and therefore prohibited. If you
> are not the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> information is strictly prohibited and may be unlawful. =======================
> ====================================
> 


-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in fsnotify for
       [not found]       ` <OF184949B9.0B312CF1-ON80258319.003B7FFA-80258319.003BA9B9@LocalDomain>
@ 2018-10-01 11:09         ` Nigel Banks
  0 siblings, 0 replies; 5+ messages in thread
From: Nigel Banks @ 2018-10-01 11:09 UTC (permalink / raw)
  To: linux-fsdevel

Nigel Banks/Waters wrote on 10/01/2018 11:51:40 AM:

> From: Nigel Banks/Waters
> To: linux-fsdevel@vger.kernel.org
> Date: 10/01/2018 11:51 AM
> Subject: Re: Deadlock in fsnotify for
> 
> Sorry for the repeated messages, my work email client is a bit clunky.
> 
> Seems like the attachment couldn't be sent along to linux-fsdevel so
> I'm resending the email with a link to the kern.log
> 
> https://gist.github.com/nigelgbanks/a38143b6f16be14026637efc0c362d3a
> 
> From: Nigel Banks/Waters
> To: Jan Kara <jack@suse.cz>
> Cc: Amir Goldstein <amir73il@gmail.com>, linux-fsdevel@vger.kernel.org
> Date: 10/01/2018 11:47 AM
> Subject: Re: Deadlock in fsnotify for
> 
> Seems like the attachment couldn't be sent along to linux-fsdevel so
> I'm resending the email with a link to the kern.log
> 
> https://gist.github.com/nigelgbanks/a38143b6f16be14026637efc0c362d3a
> 
> From: Nigel Banks/Waters
> To: Jan Kara <jack@suse.cz>
> Cc: Amir Goldstein <amir73il@gmail.com>, linux-fsdevel@vger.kernel.org
> Date: 09/28/2018 02:31 PM
> Subject: Re: Deadlock in fsnotify for
> 
> Hello Again,
> 
> I've attached the kern.log as you instructed, please let me know if 
> there is any more information I can provide.
> 
> [attachment "kern.log" deleted by Nigel Banks/Waters] 
> 
> Cheers,
> 
> Nigel
> 
> From: Jan Kara <jack@suse.cz>
> To: Nigel Banks <Nigel_Banks@waters.com>
> Cc: jack@suse.cz, linux-fsdevel@vger.kernel.org, Amir Goldstein 
> <amir73il@gmail.com>
> Date: 09/27/2018 05:24 PM
> Subject: Re: Deadlock in fsnotify for
> 
> Hello,
> 
> [added to CC other relevant mails]
> 
> On Thu 27-09-18 16:44:53, Nigel Banks wrote:
> > Sorry to trouble you, but from looking through the git history of 
linux/fs/
> > notify you seem to be the best person to contact.
> > 
> > I've encounter a hard to reproduce situation that happens on our CI
> > servers, in which it becomes impossible to release any inotify file
> > descriptors. We're currently running Ubuntu 18.04 (Kernel 4.15) using
> > ext4 fs, and our code is running in docker containers (overlay2) if 
that
> > makes a difference.
> > 
> > Essentially we're running a number of concurrent tests which 
internally
> > use inotify to monitor some directories this all works fine and they
> > clean up after themselves, but after several days there will be a
> > deadlock in the kernel code (sys stack below):
> > 
> > [<0>] flush_work+0x126/0x1e0
> > [<0>] flush_delayed_work+0x3f/0x50
> > [<0>] fsnotify_wait_marks_destroyed+0x15/0x20
> > [<0>] fsnotify_destroy_group+0x48/0xd0
> > [<0>] inotify_release+0x1e/0x50
> > [<0>] __fput+0xea/0x220
> > [<0>] ____fput+0xe/0x10
> > [<0>] task_work_run+0x9d/0xc0
> > [<0>] exit_to_usermode_loop+0xc0/0xd0
> > [<0>] do_syscall_64+0x115/0x130
> > [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > [<0>] 0xffffffffffffffff
> 
> Hum, I don't remember seeing any deadlock like this. When a system hangs
> like this, can you please do:
> 
> echo w >/proc/sysrq-trigger
> 
> and send me the output of 'dmesg' command after that. In that output we
> should see all hung tasks (including kernel threads) and their traces 
and
> hopefully it will tell us more.
> 
> > Once a processes gets stuck in this uninterruptable sleep it will 
> never wake.
> > At this point the system is still usable, we're able to create more 
inotify
> > instances and receive messages for them, but we are not able to close 
any of
> > them. So eventually we run out of handles and the system becomes 
> unstable, not
> > to mention we can't run any more tests on the machine at this point, 
and a
> > reboot is required.
> 
> Yes, this is expected. I looks like some deadlock in the fsnotify
> subsystem.
> 
> > From my research, it looks like lxc project has also encountered this 
issue:
> > https://github.com/lxc/lxc/issues/2456, like them we also didn't 
experience
> > this behaviour with our previous set-up Ubuntu 16.04 (Kernel 14.04).
> > 
> > I had a look through the bug lists and through the commit history 
> for linux/fs/
> > notify and could not find this issue listed anywhere.
> > 
> > I've  attempted to write a small C program using pthreads and the 
inotify
> > sys-calls, but was unable to create a program that could reproduce
> this issue.
> 
> Thanks for report.
> 
>                         Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


===========================================================
The information in this email is confidential, and is intended solely for the addressee(s). 
Access to this email by anyone else is unauthorized and therefore prohibited.  If you are 
not the intended recipient you are notified that disclosing, copying, distributing or taking 
any action in reliance on the contents of this information is strictly prohibited and may be unlawful.
===========================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Deadlock in fsnotify for
       [not found]       ` <OFD621F3D0.4B122C7F-ON80258319.00395CDF-80258319.00399966@waters.com>
@ 2018-10-01 14:09         ` Amir Goldstein
  0 siblings, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2018-10-01 14:09 UTC (permalink / raw)
  To: Nigel_Banks; +Cc: Jan Kara, linux-fsdevel, Mark Salyzyn

On Mon, Oct 1, 2018 at 1:29 PM Nigel Banks <Nigel_Banks@waters.com> wrote:
>
> Thank you Amir,
>
> I will try the patch you've recommend. It some times takes several days for this issue to arise, so you may not hear back from me for a week or so. On a side note if I were to use AUFS instead of Overlay2 would that potentially avoid this issue?
>

I don't know the answer to that question, but there is an easy way to find out.
The fix patch has a simple reproducer in commit message.
Run the same reproducer with AUFS instead of overlayfs and tell me.

I can only say that AFAIK all past versions of overlayfs has had this problem
with inotify.
I can also say that if you have a hardlink in lower layer and you are not using
the new overlayfs index feature (docker is not using this feature) then said
inotify issue will still happen on that lower hardlink.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-01 20:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OFAA2B5164.CC89CD80-ON80258315.0053848D-80258315.005681B2@waters.com>
2018-09-27 16:24 ` Deadlock in fsnotify for Jan Kara
     [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004AAC30@waters.com>
2018-09-28 14:27     ` Amir Goldstein
     [not found]       ` <OFD621F3D0.4B122C7F-ON80258319.00395CDF-80258319.00399966@waters.com>
2018-10-01 14:09         ` Amir Goldstein
2018-10-01 10:25     ` Jan Kara
     [not found]   ` <OFC1587FEF.00F5AC8A-ON80258316.004A23A8-80258316.004A4100@LocalDomain>
     [not found]     ` <OFBCEABA83.BA7DAF1C-ON80258319.003B1DB4-80258319.003B4841@LocalDomain>
     [not found]       ` <OF184949B9.0B312CF1-ON80258319.003B7FFA-80258319.003BA9B9@LocalDomain>
2018-10-01 11:09         ` Nigel Banks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.