* Re: stuck in inotify_release [not found] <CANp+0hhbsegocrx-MK0DS=Qx4DfivB27nSKHrukiFAY6x6cJQA@mail.gmail.com> @ 2019-03-28 9:52 ` Jan Kara 2019-05-06 18:54 ` Olivier Chapelliere 0 siblings, 1 reply; 5+ messages in thread From: Jan Kara @ 2019-03-28 9:52 UTC (permalink / raw) To: Olivier Chapelliere; +Cc: jack, linux-fsdevel Hello, On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > According to what I read on internet you seem to be the right person to get > in touch with when one has problems with inotify. Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we use (added to CC). > We are monitoring several directories in python processes through inotify. > But after few days all processes are stuck in a call to inotify_release. > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > (dmesg content attached): > echo w > /proc/sysrq-trigger Looking through the stack traces, all of them wait in fput() -> inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> flush_delayed_work(&reaper_work). So they wait for worker process to destroy all marks for the group. However that worker (kworker/u8:4) is stuck in: fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) So the question is who is holding fsnotify_mark_srcu so that SRCU cannot declare new grace period. I don't see any such process among the processes you've shown in the dump (but it should be there) so it's a bit of a mystery. > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > This problem appears on a weekly basis so I will be able to run additional > commands to track down the issue if needed. So when this happens again, try grabbing output of sysrq-l and sysrq-t if we can find the task holding fsnotify_mark_srcu. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stuck in inotify_release 2019-03-28 9:52 ` stuck in inotify_release Jan Kara @ 2019-05-06 18:54 ` Olivier Chapelliere 2019-05-14 6:22 ` Olivier Chapelliere 2019-05-14 9:25 ` Jan Kara 0 siblings, 2 replies; 5+ messages in thread From: Olivier Chapelliere @ 2019-05-06 18:54 UTC (permalink / raw) To: Jan Kara; +Cc: linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1938 bytes --] Jan and all, It finally took a month to happen again : python processes watching a directory are stuck in inotify_release. I ran the sysrq commands as you requested and attached the result. Thanks for your help On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@suse.cz> wrote: > > Hello, > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > According to what I read on internet you seem to be the right person to get > > in touch with when one has problems with inotify. > > Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we use > (added to CC). > > > We are monitoring several directories in python processes through inotify. > > But after few days all processes are stuck in a call to inotify_release. > > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > > (dmesg content attached): > > echo w > /proc/sysrq-trigger > > Looking through the stack traces, all of them wait in fput() -> > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > flush_delayed_work(&reaper_work). So they wait for worker process to > destroy all marks for the group. However that worker (kworker/u8:4) is > stuck in: > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > So the question is who is holding fsnotify_mark_srcu so that SRCU cannot > declare new grace period. I don't see any such process among the processes > you've shown in the dump (but it should be there) so it's a bit of a > mystery. > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > This problem appears on a weekly basis so I will be able to run additional > > commands to track down the issue if needed. > > So when this happens again, try grabbing output of sysrq-l and sysrq-t if > we can find the task holding fsnotify_mark_srcu. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR -- Olivier Chapelliere [-- Attachment #2: kern.log.tar.gz --] [-- Type: application/gzip, Size: 49132 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stuck in inotify_release 2019-05-06 18:54 ` Olivier Chapelliere @ 2019-05-14 6:22 ` Olivier Chapelliere 2019-05-14 9:25 ` Jan Kara 1 sibling, 0 replies; 5+ messages in thread From: Olivier Chapelliere @ 2019-05-14 6:22 UTC (permalink / raw) To: Jan Kara; +Cc: linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 2327 bytes --] Hi all, If it can help troubleshoot the issue, I found another host with the same symptoms. I ran sysrq commands and attached the kernel log file. Thanks Olivier On Mon, May 6, 2019 at 8:54 PM Olivier Chapelliere <olivier.chapelliere@alcmeon.com> wrote: > > Jan and all, > > It finally took a month to happen again : python processes watching a > directory are stuck in inotify_release. > I ran the sysrq commands as you requested and attached the result. > > Thanks for your help > > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@suse.cz> wrote: > > > > Hello, > > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > > According to what I read on internet you seem to be the right person to get > > > in touch with when one has problems with inotify. > > > > Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we use > > (added to CC). > > > > > We are monitoring several directories in python processes through inotify. > > > But after few days all processes are stuck in a call to inotify_release. > > > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > > > (dmesg content attached): > > > echo w > /proc/sysrq-trigger > > > > Looking through the stack traces, all of them wait in fput() -> > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > > flush_delayed_work(&reaper_work). So they wait for worker process to > > destroy all marks for the group. However that worker (kworker/u8:4) is > > stuck in: > > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > > > So the question is who is holding fsnotify_mark_srcu so that SRCU cannot > > declare new grace period. I don't see any such process among the processes > > you've shown in the dump (but it should be there) so it's a bit of a > > mystery. > > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > > This problem appears on a weekly basis so I will be able to run additional > > > commands to track down the issue if needed. > > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t if > > we can find the task holding fsnotify_mark_srcu. > > > > Honza > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR > > > > -- > Olivier Chapelliere -- Olivier Chapelliere [-- Attachment #2: kern.log.tar.gz --] [-- Type: application/gzip, Size: 57484 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stuck in inotify_release 2019-05-06 18:54 ` Olivier Chapelliere 2019-05-14 6:22 ` Olivier Chapelliere @ 2019-05-14 9:25 ` Jan Kara [not found] ` <CANp+0hiZt=oEWMUqRC-pv9=8JnvSyPcpDCf+O5whth1C_q0jNA@mail.gmail.com> 1 sibling, 1 reply; 5+ messages in thread From: Jan Kara @ 2019-05-14 9:25 UTC (permalink / raw) To: Olivier Chapelliere; +Cc: Jan Kara, linux-fsdevel Hello! On Mon 06-05-19 20:54:24, Olivier Chapelliere wrote: > It finally took a month to happen again : python processes watching a > directory are stuck in inotify_release. > I ran the sysrq commands as you requested and attached the result. Thanks. I was looking into these traces but the situation is the same as before. Everyone is blocked waiting for inotify group to shut down. That is blocked waiting for worker to finish destroying notification marks and the worker is blocked in synchronize_srcu() waiting for SRCU grace period to end. Now I didn't find any process that would be holding the SRCU lock so it seems that someone exited the SRCU locked section without releasing the lock. I've checked 4.15 your Ubuntu kernel is based on and I don't see how that would be possible. It it possible though, that the problem is introduced by some Ubuntu specific backports. Would it be possible for you to run some vanilla kernel (i.e., without Ubuntu modifications)? Honza > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@suse.cz> wrote: > > > > Hello, > > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > > According to what I read on internet you seem to be the right person to get > > > in touch with when one has problems with inotify. > > > > Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we use > > (added to CC). > > > > > We are monitoring several directories in python processes through inotify. > > > But after few days all processes are stuck in a call to inotify_release. > > > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > > > (dmesg content attached): > > > echo w > /proc/sysrq-trigger > > > > Looking through the stack traces, all of them wait in fput() -> > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > > flush_delayed_work(&reaper_work). So they wait for worker process to > > destroy all marks for the group. However that worker (kworker/u8:4) is > > stuck in: > > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > > > So the question is who is holding fsnotify_mark_srcu so that SRCU cannot > > declare new grace period. I don't see any such process among the processes > > you've shown in the dump (but it should be there) so it's a bit of a > > mystery. > > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > > This problem appears on a weekly basis so I will be able to run additional > > > commands to track down the issue if needed. > > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t if > > we can find the task holding fsnotify_mark_srcu. > > > > Honza > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR > > > > -- > Olivier Chapelliere -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CANp+0hiZt=oEWMUqRC-pv9=8JnvSyPcpDCf+O5whth1C_q0jNA@mail.gmail.com>]
[parent not found: <CAC8Mkjy=igiQatSVXNXphjyzGn2faZ75XZZGANWOtt3hvwk8DA@mail.gmail.com>]
* Re: stuck in inotify_release [not found] ` <CAC8Mkjy=igiQatSVXNXphjyzGn2faZ75XZZGANWOtt3hvwk8DA@mail.gmail.com> @ 2019-05-14 15:44 ` Jan Kara 0 siblings, 0 replies; 5+ messages in thread From: Jan Kara @ 2019-05-14 15:44 UTC (permalink / raw) To: mathieu lacage; +Cc: jack, Olivier Chapelliere, linux-fsdevel Hi! On Tue 14-05-19 16:35:29, mathieu lacage wrote: > We are going to setup a new ubuntu 16.04 server, rebuild a vanilla 5.0 > kernel on that and run a fraction of our production workload on that. Is > this ok for you ? If so, I will let you know as soon as we observe the > problem on this server again. Yes, that should rule out any Ubuntu specific problems thanks! Honza > > Mathieu > > Le mar. 14 mai 2019 à 15:22, Olivier Chapelliere < > olivier.chapelliere@alcmeon.com> a écrit : > > > ---------- Forwarded message --------- > > From: Jan Kara <jack@suse.cz> > > Date: Tue, May 14, 2019 at 11:25 AM > > Subject: Re: stuck in inotify_release > > To: Olivier Chapelliere <olivier.chapelliere@alcmeon.com> > > Cc: Jan Kara <jack@suse.cz>, <linux-fsdevel@vger.kernel.org> > > > > > > Hello! > > > > On Mon 06-05-19 20:54:24, Olivier Chapelliere wrote: > > > It finally took a month to happen again : python processes watching a > > > directory are stuck in inotify_release. > > > I ran the sysrq commands as you requested and attached the result. > > > > Thanks. I was looking into these traces but the situation is the same as > > before. Everyone is blocked waiting for inotify group to shut down. That is > > blocked waiting for worker to finish destroying notification marks and the > > worker is blocked in synchronize_srcu() waiting for SRCU grace period to > > end. Now I didn't find any process that would be holding the SRCU lock so > > it seems that someone exited the SRCU locked section without releasing the > > lock. I've checked 4.15 your Ubuntu kernel is based on and I don't see how > > that would be possible. It it possible though, that the problem is > > introduced by some Ubuntu specific backports. Would it be possible for you > > to run some vanilla kernel (i.e., without Ubuntu modifications)? > > > > Honza > > > > > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@suse.cz> wrote: > > > > > > > > Hello, > > > > > > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > > > > According to what I read on internet you seem to be the right person > > to get > > > > > in touch with when one has problems with inotify. > > > > > > > > Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we > > use > > > > (added to CC). > > > > > > > > > We are monitoring several directories in python processes through > > inotify. > > > > > But after few days all processes are stuck in a call to > > inotify_release. > > > > > Once I detected the problem, I dumped info to dmesg with > > sysrq-trigger > > > > > (dmesg content attached): > > > > > echo w > /proc/sysrq-trigger > > > > > > > > Looking through the stack traces, all of them wait in fput() -> > > > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > > > > flush_delayed_work(&reaper_work). So they wait for worker process to > > > > destroy all marks for the group. However that worker (kworker/u8:4) is > > > > stuck in: > > > > > > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > > > > > > > So the question is who is holding fsnotify_mark_srcu so that SRCU > > cannot > > > > declare new grace period. I don't see any such process among the > > processes > > > > you've shown in the dump (but it should be there) so it's a bit of a > > > > mystery. > > > > > > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > > > > This problem appears on a weekly basis so I will be able to run > > additional > > > > > commands to track down the issue if needed. > > > > > > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t > > if > > > > we can find the task holding fsnotify_mark_srcu. > > > > > > > > Honza > > > > -- > > > > Jan Kara <jack@suse.com> > > > > SUSE Labs, CR > > > > > > > > > > > > -- > > > Olivier Chapelliere > > > > > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR > > > > > > -- > > Olivier Chapelliere > > > > > -- > Mathieu Lacage <mathieu.lacage@alcmeon.com> -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-05-14 15:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CANp+0hhbsegocrx-MK0DS=Qx4DfivB27nSKHrukiFAY6x6cJQA@mail.gmail.com> 2019-03-28 9:52 ` stuck in inotify_release Jan Kara 2019-05-06 18:54 ` Olivier Chapelliere 2019-05-14 6:22 ` Olivier Chapelliere 2019-05-14 9:25 ` Jan Kara [not found] ` <CANp+0hiZt=oEWMUqRC-pv9=8JnvSyPcpDCf+O5whth1C_q0jNA@mail.gmail.com> [not found] ` <CAC8Mkjy=igiQatSVXNXphjyzGn2faZ75XZZGANWOtt3hvwk8DA@mail.gmail.com> 2019-05-14 15:44 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.