* [Cluster-devel] [PATCH 0/2] GFS2 locking patches @ 2018-10-08 12:36 Mark Syms 2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw) To: cluster-devel.redhat.com A pair of locking issues in GFS2 observed when running VM storgae stress tests. 0001-GFS2-use-schedule-timeout-in-find-insert-glock.patch covers a case where an application level flock would wedge. The VM control plane makes extensive use of flocks to control access to VM virtual disks and databases and we envountered several failed tests where the flocks did not get acquired even when noone was holding them. Investigation indicates that there is a race in find_insert_glock where the call to schedule can be called when the expected waiter has already completed its work. Replace schedule with schedule_timeout and log. 0002-GFS2-Flush-the-GFS2-delete-workqueue-before-stopping.patch covers a case where umount would wedge unrecoverably. The completion of the stress test involves the deletion of the test machines and virtual disks followed by the filesystem being unmounted on all hosts before the hosts are returned to the lab pool. umount was found to wedge and this has been traced to gfs2_log_reserve being called in the flush_workqueue but after the associated kthread processes had been stopped. Thus there was nobody to handle the log reserver request and the code wedged. Mark Syms (1): GFS2: use schedule timeout in find insert glock Tim Smith (1): GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads fs/gfs2/glock.c | 3 ++- fs/gfs2/super.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms @ 2018-10-08 12:36 ` Mark Syms 2018-10-08 12:56 ` Steven Whitehouse 2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com> 2 siblings, 1 reply; 25+ messages in thread From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw) To: cluster-devel.redhat.com During a VM stress test we encountered a system lockup and kern.log contained kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds. kernel: [21389.462749] Tainted: G O 4.4.0+10 #1 kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000 kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000 kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418 kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68 kernel: [21389.462803] Call Trace: kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80 kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2] kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2] kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2] kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2] kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2] kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120 kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2] kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170 kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71 On examination of the code it was determined that this code path is only taken if the selected glock is marked as dead, the supposition therefore is that by the time schedule as called the glock had been cleaned up and therefore nothing woke the schedule. Instead of calling schedule, call schedule_timeout(HZ) so at least we get a chance to re-evaluate. On repeating the stress test, the printk message was seen once in the logs across four servers but no further occurences nor were there any stuck task log entries. This indicates that when the timeout occured the code repeated the lookup and did not find the same glock entry but as we hadn't been woken this means that we would never have been woken. Signed-off-by: Mark Syms <mark.syms@citrix.com> Signed-off-by: Tim Smith <tim.smith@citrix.com> --- fs/gfs2/glock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name, } if (gl && !lockref_get_not_dead(&gl->gl_lockref)) { rcu_read_unlock(); - schedule(); + if (schedule_timeout(HZ) == 0) + printk(KERN_INFO "find_insert_glock schedule timed out\n"); goto again; } out: -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms @ 2018-10-08 12:56 ` Steven Whitehouse 2018-10-08 12:59 ` Mark Syms 0 siblings, 1 reply; 25+ messages in thread From: Steven Whitehouse @ 2018-10-08 12:56 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 08/10/18 13:36, Mark Syms wrote: > During a VM stress test we encountered a system lockup and kern.log contained > > kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds. > kernel: [21389.462749] Tainted: G O 4.4.0+10 #1 > kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000 > kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000 > kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418 > kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68 > kernel: [21389.462803] Call Trace: > kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80 > kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2] > kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2] > kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2] > kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2] > kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2] > kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120 > kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2] > kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170 > kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71 > > On examination of the code it was determined that this code path > is only taken if the selected glock is marked as dead, the supposition > therefore is that by the time schedule as called the glock had been > cleaned up and therefore nothing woke the schedule. Instead of calling > schedule, call schedule_timeout(HZ) so at least we get a chance to > re-evaluate. > > On repeating the stress test, the printk message was seen once in the logs > across four servers but no further occurences nor were there any stuck task > log entries. This indicates that when the timeout occured the code repeated > the lookup and did not find the same glock entry but as we hadn't been > woken this means that we would never have been woken. > > Signed-off-by: Mark Syms <mark.syms@citrix.com> > Signed-off-by: Tim Smith <tim.smith@citrix.com> > --- > fs/gfs2/glock.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index 4614ee2..0a59a01 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name, > } > if (gl && !lockref_get_not_dead(&gl->gl_lockref)) { > rcu_read_unlock(); > - schedule(); > + if (schedule_timeout(HZ) == 0) > + printk(KERN_INFO "find_insert_glock schedule timed out\n"); > goto again; > } > out: That is a bit odd. In fact that whole function looks odd. I wonder why it needs to wait in the first place. It should be a simple comparison here. If the glock is dead then nothing else should touch it, so we are safe to add a new one into the hash table. The wait is almost certainly a bug, Steve. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 12:56 ` Steven Whitehouse @ 2018-10-08 12:59 ` Mark Syms 2018-10-08 13:03 ` Steven Whitehouse 0 siblings, 1 reply; 25+ messages in thread From: Mark Syms @ 2018-10-08 12:59 UTC (permalink / raw) To: cluster-devel.redhat.com That sounds entirely reasonable so long as you are absolutely sure that nothing is ever going to mess with that glock, we erred on the side of more caution not knowing whether it would be guaranteed safe or not. Thanks, Mark -----Original Message----- From: Steven Whitehouse <swhiteho@redhat.com> Sent: 08 October 2018 13:56 To: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com Cc: Ross Lagerwall <ross.lagerwall@citrix.com>; Tim Smith <tim.smith@citrix.com> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Hi, On 08/10/18 13:36, Mark Syms wrote: > During a VM stress test we encountered a system lockup and kern.log > contained > > kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds. > kernel: [21389.462749] Tainted: G O 4.4.0+10 #1 > kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000 > kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 > ffff88005a509c00 ffff88019628c000 > kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 > fffffffffffffe58 ffff8801818da418 > kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 > ffff8801818da5c0 ffff88019628bd68 > kernel: [21389.462803] Call Trace: > kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80 > kernel: [21389.462877] [<ffffffffa0663624>] > find_insert_glock+0x4a4/0x530 [gfs2] > kernel: [21389.462891] [<ffffffffa0660c20>] ? > gfs2_holder_wake+0x20/0x20 [gfs2] > kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 > [gfs2] > kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2] > kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 > [gfs2] > kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120 > kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 > [gfs2] > kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170 > kernel: [21389.462948] [<ffffffff815a57ee>] > entry_SYSCALL_64_fastpath+0x12/0x71 > > On examination of the code it was determined that this code path is > only taken if the selected glock is marked as dead, the supposition > therefore is that by the time schedule as called the glock had been > cleaned up and therefore nothing woke the schedule. Instead of calling > schedule, call schedule_timeout(HZ) so at least we get a chance to > re-evaluate. > > On repeating the stress test, the printk message was seen once in the > logs across four servers but no further occurences nor were there any > stuck task log entries. This indicates that when the timeout occured > the code repeated the lookup and did not find the same glock entry but > as we hadn't been woken this means that we would never have been woken. > > Signed-off-by: Mark Syms <mark.syms@citrix.com> > Signed-off-by: Tim Smith <tim.smith@citrix.com> > --- > fs/gfs2/glock.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01 > 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name, > } > if (gl && !lockref_get_not_dead(&gl->gl_lockref)) { > rcu_read_unlock(); > - schedule(); > + if (schedule_timeout(HZ) == 0) > + printk(KERN_INFO "find_insert_glock schedule timed out\n"); > goto again; > } > out: That is a bit odd. In fact that whole function looks odd. I wonder why it needs to wait in the first place. It should be a simple comparison here. If the glock is dead then nothing else should touch it, so we are safe to add a new one into the hash table. The wait is almost certainly a bug, Steve. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 12:59 ` Mark Syms @ 2018-10-08 13:03 ` Steven Whitehouse 2018-10-08 13:10 ` Tim Smith 2018-10-08 13:25 ` Bob Peterson 0 siblings, 2 replies; 25+ messages in thread From: Steven Whitehouse @ 2018-10-08 13:03 UTC (permalink / raw) To: cluster-devel.redhat.com On 08/10/18 13:59, Mark Syms wrote: > That sounds entirely reasonable so long as you are absolutely sure that nothing is ever going to mess with that glock, we erred on the side of more caution not knowing whether it would be guaranteed safe or not. > > Thanks, > > Mark We should have a look at the history to see how that wait got added. However the "dead" flag here means "don't touch this glock" and is there so that we can separate the marking dead from the actual removal from the list (which simplifies the locking during the scanning procedures) Steve. > -----Original Message----- > From: Steven Whitehouse <swhiteho@redhat.com> > Sent: 08 October 2018 13:56 > To: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com > Cc: Ross Lagerwall <ross.lagerwall@citrix.com>; Tim Smith <tim.smith@citrix.com> > Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock > > Hi, > > > On 08/10/18 13:36, Mark Syms wrote: >> During a VM stress test we encountered a system lockup and kern.log >> contained >> >> kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds. >> kernel: [21389.462749] Tainted: G O 4.4.0+10 #1 >> kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000 >> kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 >> ffff88005a509c00 ffff88019628c000 >> kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 >> fffffffffffffe58 ffff8801818da418 >> kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 >> ffff8801818da5c0 ffff88019628bd68 >> kernel: [21389.462803] Call Trace: >> kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80 >> kernel: [21389.462877] [<ffffffffa0663624>] >> find_insert_glock+0x4a4/0x530 [gfs2] >> kernel: [21389.462891] [<ffffffffa0660c20>] ? >> gfs2_holder_wake+0x20/0x20 [gfs2] >> kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 >> [gfs2] >> kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2] >> kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 >> [gfs2] >> kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120 >> kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 >> [gfs2] >> kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170 >> kernel: [21389.462948] [<ffffffff815a57ee>] >> entry_SYSCALL_64_fastpath+0x12/0x71 >> >> On examination of the code it was determined that this code path is >> only taken if the selected glock is marked as dead, the supposition >> therefore is that by the time schedule as called the glock had been >> cleaned up and therefore nothing woke the schedule. Instead of calling >> schedule, call schedule_timeout(HZ) so at least we get a chance to >> re-evaluate. >> >> On repeating the stress test, the printk message was seen once in the >> logs across four servers but no further occurences nor were there any >> stuck task log entries. This indicates that when the timeout occured >> the code repeated the lookup and did not find the same glock entry but >> as we hadn't been woken this means that we would never have been woken. >> >> Signed-off-by: Mark Syms <mark.syms@citrix.com> >> Signed-off-by: Tim Smith <tim.smith@citrix.com> >> --- >> fs/gfs2/glock.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01 >> 100644 >> --- a/fs/gfs2/glock.c >> +++ b/fs/gfs2/glock.c >> @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name, >> } >> if (gl && !lockref_get_not_dead(&gl->gl_lockref)) { >> rcu_read_unlock(); >> - schedule(); >> + if (schedule_timeout(HZ) == 0) >> + printk(KERN_INFO "find_insert_glock schedule timed out\n"); >> goto again; >> } >> out: > That is a bit odd. In fact that whole function looks odd. I wonder why it needs to wait in the first place. It should be a simple comparison here. If the glock is dead then nothing else should touch it, so we are safe to add a new one into the hash table. The wait is almost certainly a bug, > > Steve. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:03 ` Steven Whitehouse @ 2018-10-08 13:10 ` Tim Smith 2018-10-08 13:13 ` Steven Whitehouse 2018-10-08 13:25 ` Bob Peterson 1 sibling, 1 reply; 25+ messages in thread From: Tim Smith @ 2018-10-08 13:10 UTC (permalink / raw) To: cluster-devel.redhat.com On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > On 08/10/18 13:59, Mark Syms wrote: > > That sounds entirely reasonable so long as you are absolutely sure that > > nothing is ever going to mess with that glock, we erred on the side of > > more caution not knowing whether it would be guaranteed safe or not. > > > > Thanks, > > > > Mark > > We should have a look at the history to see how that wait got added. > However the "dead" flag here means "don't touch this glock" and is there > so that we can separate the marking dead from the actual removal from > the list (which simplifies the locking during the scanning procedures) You beat me to it :-) I think there might be a bit of a problem inserting a new entry with the same name before the old entry has been fully destroyed (or at least removed), which would be why the schedule() is there. -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:10 ` Tim Smith @ 2018-10-08 13:13 ` Steven Whitehouse 2018-10-08 13:26 ` Tim Smith 0 siblings, 1 reply; 25+ messages in thread From: Steven Whitehouse @ 2018-10-08 13:13 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 08/10/18 14:10, Tim Smith wrote: > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: >> On 08/10/18 13:59, Mark Syms wrote: >>> That sounds entirely reasonable so long as you are absolutely sure that >>> nothing is ever going to mess with that glock, we erred on the side of >>> more caution not knowing whether it would be guaranteed safe or not. >>> >>> Thanks, >>> >>> Mark >> We should have a look at the history to see how that wait got added. >> However the "dead" flag here means "don't touch this glock" and is there >> so that we can separate the marking dead from the actual removal from >> the list (which simplifies the locking during the scanning procedures) > You beat me to it :-) > > I think there might be a bit of a problem inserting a new entry with the same > name before the old entry has been fully destroyed (or at least removed), > which would be why the schedule() is there. > If the old entry is marked dead, all future lookups should ignore it. We should only have a single non-dead entry at a time, but that doesn't seem like it should need us to wait for it. If we do discover that the wait is really required, then it sounds like as you mentioned above there is a lost wakeup, and that must presumably be on a code path that sets the dead flag and then fails to send a wake up later on. If we can drop the wait in the first place, that seems like a better plan, Steve. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:13 ` Steven Whitehouse @ 2018-10-08 13:26 ` Tim Smith 2018-10-09 8:13 ` Mark Syms 2018-10-09 12:34 ` Andreas Gruenbacher 0 siblings, 2 replies; 25+ messages in thread From: Tim Smith @ 2018-10-08 13:26 UTC (permalink / raw) To: cluster-devel.redhat.com On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: > Hi, > > On 08/10/18 14:10, Tim Smith wrote: > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > >> On 08/10/18 13:59, Mark Syms wrote: > >>> That sounds entirely reasonable so long as you are absolutely sure that > >>> nothing is ever going to mess with that glock, we erred on the side of > >>> more caution not knowing whether it would be guaranteed safe or not. > >>> > >>> Thanks, > >>> > >>> Mark > >> > >> We should have a look at the history to see how that wait got added. > >> However the "dead" flag here means "don't touch this glock" and is there > >> so that we can separate the marking dead from the actual removal from > >> the list (which simplifies the locking during the scanning procedures) > > > > You beat me to it :-) > > > > I think there might be a bit of a problem inserting a new entry with the > > same name before the old entry has been fully destroyed (or at least > > removed), which would be why the schedule() is there. > > If the old entry is marked dead, all future lookups should ignore it. We > should only have a single non-dead entry at a time, but that doesn't > seem like it should need us to wait for it. On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah. > If we do discover that the wait is really required, then it sounds like > as you mentioned above there is a lost wakeup, and that must presumably > be on a code path that sets the dead flag and then fails to send a wake > up later on. If we can drop the wait in the first place, that seems like > a better plan, Ooooh, I wonder if these two lines: wake_up_glock(gl); call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); in gfs2_glock_free() are the wrong way round? -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:26 ` Tim Smith @ 2018-10-09 8:13 ` Mark Syms 2018-10-09 8:41 ` Steven Whitehouse 2018-10-09 12:34 ` Andreas Gruenbacher 1 sibling, 1 reply; 25+ messages in thread From: Mark Syms @ 2018-10-09 8:13 UTC (permalink / raw) To: cluster-devel.redhat.com Having swapped the line below around we still see the timeout on schedule fire, but only once in a fairly mega stress test. This is why we weren't worried about the timeout being HZ, the situation is hardly ever hit as having to wait is rare and normally we are woken from schedule and without a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't seem too bad. Mark. -----Original Message----- From: Tim Smith <tim.smith@citrix.com> Sent: 08 October 2018 14:27 To: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: > Hi, > > On 08/10/18 14:10, Tim Smith wrote: > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > >> On 08/10/18 13:59, Mark Syms wrote: > >>> That sounds entirely reasonable so long as you are absolutely sure > >>> that nothing is ever going to mess with that glock, we erred on > >>> the side of more caution not knowing whether it would be guaranteed safe or not. > >>> > >>> Thanks, > >>> > >>> Mark > >> > >> We should have a look at the history to see how that wait got added. > >> However the "dead" flag here means "don't touch this glock" and is > >> there so that we can separate the marking dead from the actual > >> removal from the list (which simplifies the locking during the > >> scanning procedures) > > > > You beat me to it :-) > > > > I think there might be a bit of a problem inserting a new entry with > > the same name before the old entry has been fully destroyed (or at > > least removed), which would be why the schedule() is there. > > If the old entry is marked dead, all future lookups should ignore it. > We should only have a single non-dead entry at a time, but that > doesn't seem like it should need us to wait for it. On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah. > If we do discover that the wait is really required, then it sounds > like as you mentioned above there is a lost wakeup, and that must > presumably be on a code path that sets the dead flag and then fails to > send a wake up later on. If we can drop the wait in the first place, > that seems like a better plan, Ooooh, I wonder if these two lines: wake_up_glock(gl); call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); in gfs2_glock_free() are the wrong way round? -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 8:13 ` Mark Syms @ 2018-10-09 8:41 ` Steven Whitehouse 2018-10-09 8:50 ` Mark Syms 0 siblings, 1 reply; 25+ messages in thread From: Steven Whitehouse @ 2018-10-09 8:41 UTC (permalink / raw) To: cluster-devel.redhat.com On 09/10/18 09:13, Mark Syms wrote: > Having swapped the line below around we still see the timeout on schedule fire, but only once in > a fairly mega stress test. This is why we weren't worried about the timeout being HZ, the situation > is hardly ever hit as having to wait is rare and normally we are woken from schedule and without > a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't > seem too bad. > > Mark. We should still get to the bottom of why the wake up is missing though, since without that fix we won't know if there is something else wrong somewhere, Steve. > > -----Original Message----- > From: Tim Smith <tim.smith@citrix.com> > Sent: 08 October 2018 14:27 > To: Steven Whitehouse <swhiteho@redhat.com> > Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com> > Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: >> Hi, >> >> On 08/10/18 14:10, Tim Smith wrote: >>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: >>>> On 08/10/18 13:59, Mark Syms wrote: >>>>> That sounds entirely reasonable so long as you are absolutely sure >>>>> that nothing is ever going to mess with that glock, we erred on >>>>> the side of more caution not knowing whether it would be guaranteed safe or not. >>>>> >>>>> Thanks, >>>>> >>>>> Mark >>>> We should have a look at the history to see how that wait got added. >>>> However the "dead" flag here means "don't touch this glock" and is >>>> there so that we can separate the marking dead from the actual >>>> removal from the list (which simplifies the locking during the >>>> scanning procedures) >>> You beat me to it :-) >>> >>> I think there might be a bit of a problem inserting a new entry with >>> the same name before the old entry has been fully destroyed (or at >>> least removed), which would be why the schedule() is there. >> If the old entry is marked dead, all future lookups should ignore it. >> We should only have a single non-dead entry at a time, but that >> doesn't seem like it should need us to wait for it. > On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah. > >> If we do discover that the wait is really required, then it sounds >> like as you mentioned above there is a lost wakeup, and that must >> presumably be on a code path that sets the dead flag and then fails to >> send a wake up later on. If we can drop the wait in the first place, >> that seems like a better plan, > Ooooh, I wonder if these two lines: > > wake_up_glock(gl); > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > in gfs2_glock_free() are the wrong way round? > > -- > Tim Smith <tim.smith@citrix.com> > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 8:41 ` Steven Whitehouse @ 2018-10-09 8:50 ` Mark Syms 0 siblings, 0 replies; 25+ messages in thread From: Mark Syms @ 2018-10-09 8:50 UTC (permalink / raw) To: cluster-devel.redhat.com We think we have, just making a build to test. Will follow up later. Mark. -----Original Message----- From: Steven Whitehouse <swhiteho@redhat.com> Sent: 09 October 2018 09:41 To: Mark Syms <Mark.Syms@citrix.com>; Tim Smith <tim.smith@citrix.com> Cc: cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock On 09/10/18 09:13, Mark Syms wrote: > Having swapped the line below around we still see the timeout on > schedule fire, but only once in a fairly mega stress test. This is why > we weren't worried about the timeout being HZ, the situation is hardly > ever hit as having to wait is rare and normally we are woken from > schedule and without a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't seem too bad. > > Mark. We should still get to the bottom of why the wake up is missing though, since without that fix we won't know if there is something else wrong somewhere, Steve. > > -----Original Message----- > From: Tim Smith <tim.smith@citrix.com> > Sent: 08 October 2018 14:27 > To: Steven Whitehouse <swhiteho@redhat.com> > Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross > Lagerwall <ross.lagerwall@citrix.com> > Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in > find insert glock > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: >> Hi, >> >> On 08/10/18 14:10, Tim Smith wrote: >>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: >>>> On 08/10/18 13:59, Mark Syms wrote: >>>>> That sounds entirely reasonable so long as you are absolutely sure >>>>> that nothing is ever going to mess with that glock, we erred on >>>>> the side of more caution not knowing whether it would be guaranteed safe or not. >>>>> >>>>> Thanks, >>>>> >>>>> Mark >>>> We should have a look at the history to see how that wait got added. >>>> However the "dead" flag here means "don't touch this glock" and is >>>> there so that we can separate the marking dead from the actual >>>> removal from the list (which simplifies the locking during the >>>> scanning procedures) >>> You beat me to it :-) >>> >>> I think there might be a bit of a problem inserting a new entry with >>> the same name before the old entry has been fully destroyed (or at >>> least removed), which would be why the schedule() is there. >> If the old entry is marked dead, all future lookups should ignore it. >> We should only have a single non-dead entry at a time, but that >> doesn't seem like it should need us to wait for it. > On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah. > >> If we do discover that the wait is really required, then it sounds >> like as you mentioned above there is a lost wakeup, and that must >> presumably be on a code path that sets the dead flag and then fails >> to send a wake up later on. If we can drop the wait in the first >> place, that seems like a better plan, > Ooooh, I wonder if these two lines: > > wake_up_glock(gl); > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > in gfs2_glock_free() are the wrong way round? > > -- > Tim Smith <tim.smith@citrix.com> > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:26 ` Tim Smith 2018-10-09 8:13 ` Mark Syms @ 2018-10-09 12:34 ` Andreas Gruenbacher 2018-10-09 14:00 ` Tim Smith 1 sibling, 1 reply; 25+ messages in thread From: Andreas Gruenbacher @ 2018-10-09 12:34 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote: > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: > > Hi, > > > > On 08/10/18 14:10, Tim Smith wrote: > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > > >> On 08/10/18 13:59, Mark Syms wrote: > > >>> That sounds entirely reasonable so long as you are absolutely sure that > > >>> nothing is ever going to mess with that glock, we erred on the side of > > >>> more caution not knowing whether it would be guaranteed safe or not. > > >>> > > >>> Thanks, > > >>> > > >>> Mark > > >> > > >> We should have a look at the history to see how that wait got added. > > >> However the "dead" flag here means "don't touch this glock" and is there > > >> so that we can separate the marking dead from the actual removal from > > >> the list (which simplifies the locking during the scanning procedures) > > > > > > You beat me to it :-) > > > > > > I think there might be a bit of a problem inserting a new entry with the > > > same name before the old entry has been fully destroyed (or at least > > > removed), which would be why the schedule() is there. > > > > If the old entry is marked dead, all future lookups should ignore it. We > > should only have a single non-dead entry at a time, but that doesn't > > seem like it should need us to wait for it. > > On the second call we do have the new glock to insert as arg2, so we could try > to swap them cleanly, yeah. > > > If we do discover that the wait is really required, then it sounds like > > as you mentioned above there is a lost wakeup, and that must presumably > > be on a code path that sets the dead flag and then fails to send a wake > > up later on. If we can drop the wait in the first place, that seems like > > a better plan, > > Ooooh, I wonder if these two lines: > > wake_up_glock(gl); > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > in gfs2_glock_free() are the wrong way round? No, what's important here is that the wake-up happens after the glock being freed has been removed from the rhashtable, and that's the case here. wake_up_glock actually accesses the glock, which may no longer be around after the call_rcu, so swapping the two lines would introduce a use-after-free bug. There must be another reason for the missed wake-up. I'll have to study the code some more. Thanks, Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 12:34 ` Andreas Gruenbacher @ 2018-10-09 14:00 ` Tim Smith 2018-10-09 14:47 ` Andreas Gruenbacher 0 siblings, 1 reply; 25+ messages in thread From: Tim Smith @ 2018-10-09 14:00 UTC (permalink / raw) To: cluster-devel.redhat.com On Tuesday, 9 October 2018 13:34:47 BST you wrote: > On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote: > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: > > > Hi, > > > > > > On 08/10/18 14:10, Tim Smith wrote: > > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > > > >> On 08/10/18 13:59, Mark Syms wrote: > > > >>> That sounds entirely reasonable so long as you are absolutely sure > > > >>> that > > > >>> nothing is ever going to mess with that glock, we erred on the side > > > >>> of > > > >>> more caution not knowing whether it would be guaranteed safe or not. > > > >>> > > > >>> Thanks, > > > >>> > > > >>> Mark > > > >> > > > >> We should have a look at the history to see how that wait got added. > > > >> However the "dead" flag here means "don't touch this glock" and is > > > >> there > > > >> so that we can separate the marking dead from the actual removal from > > > >> the list (which simplifies the locking during the scanning > > > >> procedures) > > > > > > > > You beat me to it :-) > > > > > > > > I think there might be a bit of a problem inserting a new entry with > > > > the > > > > same name before the old entry has been fully destroyed (or at least > > > > removed), which would be why the schedule() is there. > > > > > > If the old entry is marked dead, all future lookups should ignore it. We > > > should only have a single non-dead entry at a time, but that doesn't > > > seem like it should need us to wait for it. > > > > On the second call we do have the new glock to insert as arg2, so we could > > try to swap them cleanly, yeah. > > > > > If we do discover that the wait is really required, then it sounds like > > > as you mentioned above there is a lost wakeup, and that must presumably > > > be on a code path that sets the dead flag and then fails to send a wake > > > up later on. If we can drop the wait in the first place, that seems like > > > a better plan, > > > > Ooooh, I wonder if these two lines: > > wake_up_glock(gl); > > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > > > in gfs2_glock_free() are the wrong way round? > > No, what's important here is that the wake-up happens after the glock > being freed has been removed from the rhashtable, and that's the case > here. wake_up_glock actually accesses the glock, which may no longer > be around after the call_rcu, so swapping the two lines would > introduce a use-after-free bug. Yes, realised that :-/ > There must be another reason for the missed wake-up. I'll have to > study the code some more. I think it needs to go into gfs2_glock_dealloc(), in such a way as to avoid that problem. Currently working out a patch to do that. > Thanks, > Andreas -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 14:00 ` Tim Smith @ 2018-10-09 14:47 ` Andreas Gruenbacher 2018-10-09 15:34 ` Tim Smith 0 siblings, 1 reply; 25+ messages in thread From: Andreas Gruenbacher @ 2018-10-09 14:47 UTC (permalink / raw) To: cluster-devel.redhat.com Mark and Tim, does the following patch fix the problem, perhaps? Thanks, Andreas --- fs/gfs2/glock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee25f621..71e7c380d4c4 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl) wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name); if (waitqueue_active(wq)) - __wake_up(wq, TASK_NORMAL, 1, &gl->gl_name); + __wake_up(wq, TASK_NORMAL, 0, &gl->gl_name); } static void gfs2_glock_dealloc(struct rcu_head *rcu) -- 2.17.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 14:47 ` Andreas Gruenbacher @ 2018-10-09 15:34 ` Tim Smith 0 siblings, 0 replies; 25+ messages in thread From: Tim Smith @ 2018-10-09 15:34 UTC (permalink / raw) To: cluster-devel.redhat.com On Tuesday, 9 October 2018 15:47:21 BST Andreas Gruenbacher wrote: > Mark and Tim, > > does the following patch fix the problem, perhaps? The assertion being that there are several waiters. Certainly possible. We'll give it a try. It takes ~12 hours to hit one instance of this so we've got plenty of thinking time between runs. > Thanks, > Andreas > > --- > fs/gfs2/glock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index 4614ee25f621..71e7c380d4c4 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl) > wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name); > > if (waitqueue_active(wq)) > - __wake_up(wq, TASK_NORMAL, 1, &gl->gl_name); > + __wake_up(wq, TASK_NORMAL, 0, &gl->gl_name); > } > > static void gfs2_glock_dealloc(struct rcu_head *rcu) -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-08 13:03 ` Steven Whitehouse 2018-10-08 13:10 ` Tim Smith @ 2018-10-08 13:25 ` Bob Peterson 1 sibling, 0 replies; 25+ messages in thread From: Bob Peterson @ 2018-10-08 13:25 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- > > > On 08/10/18 13:59, Mark Syms wrote: > > That sounds entirely reasonable so long as you are absolutely sure that > > nothing is ever going to mess with that glock, we erred on the side of > > more caution not knowing whether it would be guaranteed safe or not. > > > > Thanks, > > > > Mark > We should have a look at the history to see how that wait got added. > However the "dead" flag here means "don't touch this glock" and is there > so that we can separate the marking dead from the actual removal from > the list (which simplifies the locking during the scanning procedures) > > Steve. (snip) > > That is a bit odd. In fact that whole function looks odd. I wonder why it > > needs to wait in the first place. It should be a simple comparison here. > > If the glock is dead then nothing else should touch it, so we are safe to > > add a new one into the hash table. The wait is almost certainly a bug, > > > > Steve. Hi, Andreas and I both did a ton of work here trying to get this right, and it was all done for the problems we had with transitioning dinodes from unlinked to free, and how the glocks beneath them were competing by way of the in-core inodes. The glocks typically outlive the inodes, but we had tons of problems with inodes coming and going, their associated glocks coming and going, and being marked dead, and the rcus underneath them. The problem is that inodes are coming and going, some with I_FREE, or I_WILL_FREE. Glocks are also coming and going, often for the same block at the same time, but two different inodes for two different files, and both inode and iopen glocks. The problems stem from remote unlinks causing "delete work" while the inodes and glocks are both being allocated and freed in rapid succession. We encountered lots of hangs and use-after-free problems. My point is not that we shouldn't fix it, but merely that we need to be VERY careful here not to reintroduce one of the countless problems we fixed. Mark's original patch seems pretty low risk to me, but if we go that route, I'd like to see a smaller timeout; 1HZ seems like a very long time. Better still to see if there's a better fix that doesn't break anything. Bob Peterson ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads 2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms 2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms @ 2018-10-08 12:36 ` Mark Syms 2018-10-08 12:57 ` Steven Whitehouse 2018-10-08 20:10 ` Bob Peterson [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com> 2 siblings, 2 replies; 25+ messages in thread From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw) To: cluster-devel.redhat.com From: Tim Smith <tim.smith@citrix.com> Flushing the workqueue can cause operations to happen which might call gfs2_log_reserve(), or get stuck waiting for locks taken by such operations. gfs2_log_reserve() can io_schedule(). If this happens, it will never wake because the only thing which can wake it is gfs2_logd() which was already stopped. This causes umount of a gfs2 filesystem to wedge permanently if, for example, the umount immediately follows a large delete operation. When this occured, the following stack trace was obtained from the umount command [<ffffffff81087968>] flush_workqueue+0x1c8/0x520 [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2] [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2] [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100 [<ffffffff811b7ff7>] kill_block_super+0x27/0x70 [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2] [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70 [<ffffffff811b79b9>] deactivate_super+0x59/0x60 [<ffffffff811d2998>] cleanup_mnt+0x58/0x80 [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20 [<ffffffff8108c87d>] task_work_run+0x7d/0xa0 [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98 [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Tim Smith <tim.smith@citrix.com> Signed-off-by: Mark Syms <mark.syms@citrix.com> --- fs/gfs2/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index c212893..a971862 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -854,10 +854,10 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp) if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) return error; + flush_workqueue(gfs2_delete_workqueue); kthread_stop(sdp->sd_quotad_process); kthread_stop(sdp->sd_logd_process); - flush_workqueue(gfs2_delete_workqueue); gfs2_quota_sync(sdp->sd_vfs, 0); gfs2_statfs_sync(sdp->sd_vfs, 0); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads 2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms @ 2018-10-08 12:57 ` Steven Whitehouse 2018-10-08 20:10 ` Bob Peterson 1 sibling, 0 replies; 25+ messages in thread From: Steven Whitehouse @ 2018-10-08 12:57 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 08/10/18 13:36, Mark Syms wrote: > From: Tim Smith <tim.smith@citrix.com> > > Flushing the workqueue can cause operations to happen which might > call gfs2_log_reserve(), or get stuck waiting for locks taken by such > operations. gfs2_log_reserve() can io_schedule(). If this happens, it > will never wake because the only thing which can wake it is gfs2_logd() > which was already stopped. > > This causes umount of a gfs2 filesystem to wedge permanently if, for > example, the umount immediately follows a large delete operation. > > When this occured, the following stack trace was obtained from the > umount command > > [<ffffffff81087968>] flush_workqueue+0x1c8/0x520 > [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2] > [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2] > [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100 > [<ffffffff811b7ff7>] kill_block_super+0x27/0x70 > [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2] > [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70 > [<ffffffff811b79b9>] deactivate_super+0x59/0x60 > [<ffffffff811d2998>] cleanup_mnt+0x58/0x80 > [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20 > [<ffffffff8108c87d>] task_work_run+0x7d/0xa0 > [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98 > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f > [<ffffffffffffffff>] 0xffffffffffffffff Good spotting! Definitely need to fix this :-) Steve. > Signed-off-by: Tim Smith <tim.smith@citrix.com> > Signed-off-by: Mark Syms <mark.syms@citrix.com> > --- > fs/gfs2/super.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c > index c212893..a971862 100644 > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -854,10 +854,10 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp) > if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) > return error; > > + flush_workqueue(gfs2_delete_workqueue); > kthread_stop(sdp->sd_quotad_process); > kthread_stop(sdp->sd_logd_process); > > - flush_workqueue(gfs2_delete_workqueue); > gfs2_quota_sync(sdp->sd_vfs, 0); > gfs2_statfs_sync(sdp->sd_vfs, 0); > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads 2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms 2018-10-08 12:57 ` Steven Whitehouse @ 2018-10-08 20:10 ` Bob Peterson 2018-10-08 20:53 ` Mark Syms 1 sibling, 1 reply; 25+ messages in thread From: Bob Peterson @ 2018-10-08 20:10 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- > From: Tim Smith <tim.smith@citrix.com> > > Flushing the workqueue can cause operations to happen which might > call gfs2_log_reserve(), or get stuck waiting for locks taken by such > operations. gfs2_log_reserve() can io_schedule(). If this happens, it > will never wake because the only thing which can wake it is gfs2_logd() > which was already stopped. > > This causes umount of a gfs2 filesystem to wedge permanently if, for > example, the umount immediately follows a large delete operation. > > When this occured, the following stack trace was obtained from the > umount command > > [<ffffffff81087968>] flush_workqueue+0x1c8/0x520 > [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2] > [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2] > [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100 > [<ffffffff811b7ff7>] kill_block_super+0x27/0x70 > [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2] > [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70 > [<ffffffff811b79b9>] deactivate_super+0x59/0x60 > [<ffffffff811d2998>] cleanup_mnt+0x58/0x80 > [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20 > [<ffffffff8108c87d>] task_work_run+0x7d/0xa0 > [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98 > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f > [<ffffffffffffffff>] 0xffffffffffffffff > > Signed-off-by: Tim Smith <tim.smith@citrix.com> > Signed-off-by: Mark Syms <mark.syms@citrix.com> > --- Hi Mark, Tim, and all, I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it on my queue, pending our investigation. https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9 Regards, Bob Peterson ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads 2018-10-08 20:10 ` Bob Peterson @ 2018-10-08 20:53 ` Mark Syms 0 siblings, 0 replies; 25+ messages in thread From: Mark Syms @ 2018-10-08 20:53 UTC (permalink / raw) To: cluster-devel.redhat.com Thanks Bob, We're having a look at the bit that Tim sent earlier to see if switching the call order makes the schedule_timeout never occur. As Steven said, it would be nice if we didn't have to worry about waiting for "dead" glocks to be cleaned up but that would require some considerable care to ensure that we don't just get the opposite race and get a newly created glock freed and deleted. Mark ________________________________ From: Bob Peterson <rpeterso@redhat.com> Sent: Monday, 8 October 2018 21:10 To: Mark Syms <Mark.Syms@citrix.com> CC: cluster-devel at redhat.com,Ross Lagerwall <ross.lagerwall@citrix.com>,Tim Smith <tim.smith@citrix.com> Subject: Re: [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads ----- Original Message ----- > From: Tim Smith <tim.smith@citrix.com> > > Flushing the workqueue can cause operations to happen which might > call gfs2_log_reserve(), or get stuck waiting for locks taken by such > operations. gfs2_log_reserve() can io_schedule(). If this happens, it > will never wake because the only thing which can wake it is gfs2_logd() > which was already stopped. > > This causes umount of a gfs2 filesystem to wedge permanently if, for > example, the umount immediately follows a large delete operation. > > When this occured, the following stack trace was obtained from the > umount command > > [<ffffffff81087968>] flush_workqueue+0x1c8/0x520 > [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2] > [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2] > [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100 > [<ffffffff811b7ff7>] kill_block_super+0x27/0x70 > [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2] > [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70 > [<ffffffff811b79b9>] deactivate_super+0x59/0x60 > [<ffffffff811d2998>] cleanup_mnt+0x58/0x80 > [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20 > [<ffffffff8108c87d>] task_work_run+0x7d/0xa0 > [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98 > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f > [<ffffffffffffffff>] 0xffffffffffffffff > > Signed-off-by: Tim Smith <tim.smith@citrix.com> > Signed-off-by: Mark Syms <mark.syms@citrix.com> > --- Hi Mark, Tim, and all, I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it on my queue, pending our investigation. https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9 Regards, Bob Peterson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181008/a5fe00da/attachment.htm> ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>]
[parent not found: <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>]
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock [not found] ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com> @ 2018-10-09 15:08 ` Tim Smith 2018-10-10 8:22 ` Tim Smith 0 siblings, 1 reply; 25+ messages in thread From: Tim Smith @ 2018-10-09 15:08 UTC (permalink / raw) To: cluster-devel.redhat.com On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote: > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote: > > On Tuesday, 9 October 2018 13:34:47 BST you wrote: > > > On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote: > > > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: > > > > > Hi, > > > > > > > > > > On 08/10/18 14:10, Tim Smith wrote: > > > > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: > > > > > >> On 08/10/18 13:59, Mark Syms wrote: > > > > > >>> That sounds entirely reasonable so long as you are absolutely > > > > > >>> sure > > > > > >>> that > > > > > >>> nothing is ever going to mess with that glock, we erred on the > > > > > >>> side > > > > > >>> of > > > > > >>> more caution not knowing whether it would be guaranteed safe or > > > > > >>> not. > > > > > >>> > > > > > >>> Thanks, > > > > > >>> > > > > > >>> Mark > > > > > >> > > > > > >> We should have a look at the history to see how that wait got > > > > > >> added. > > > > > >> However the "dead" flag here means "don't touch this glock" and > > > > > >> is > > > > > >> there > > > > > >> so that we can separate the marking dead from the actual removal > > > > > >> from > > > > > >> the list (which simplifies the locking during the scanning > > > > > >> procedures) > > > > > > > > > > > > You beat me to it :-) > > > > > > > > > > > > I think there might be a bit of a problem inserting a new entry > > > > > > with > > > > > > the > > > > > > same name before the old entry has been fully destroyed (or at > > > > > > least > > > > > > removed), which would be why the schedule() is there. > > > > > > > > > > If the old entry is marked dead, all future lookups should ignore > > > > > it. We > > > > > should only have a single non-dead entry at a time, but that doesn't > > > > > seem like it should need us to wait for it. > > > > > > > > On the second call we do have the new glock to insert as arg2, so we > > > > could > > > > try to swap them cleanly, yeah. > > > > > > > > > If we do discover that the wait is really required, then it sounds > > > > > like > > > > > as you mentioned above there is a lost wakeup, and that must > > > > > presumably > > > > > be on a code path that sets the dead flag and then fails to send a > > > > > wake > > > > > up later on. If we can drop the wait in the first place, that seems > > > > > like > > > > > a better plan, > > > > > > > > Ooooh, I wonder if these two lines: > > > > wake_up_glock(gl); > > > > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > > > > > > > in gfs2_glock_free() are the wrong way round? > > > > > > No, what's important here is that the wake-up happens after the glock > > > being freed has been removed from the rhashtable, and that's the case > > > here. wake_up_glock actually accesses the glock, which may no longer > > > be around after the call_rcu, so swapping the two lines would > > > introduce a use-after-free bug. > > > > Yes, realised that :-/ > > > > > There must be another reason for the missed wake-up. I'll have to > > > study the code some more. > > > > I think it needs to go into gfs2_glock_dealloc(), in such a way as to > > avoid > > that problem. Currently working out a patch to do that. > > That doesn't sound right: find_insert_glock is waiting for the glock > to be removed from the rhashtable. In gfs2_glock_free, we remove the > glock from the rhashtable and then we do the wake-up. Delaying the > wakeup further isn't going to help (but it might hide the problem). The only way we can get the problem we're seeing is if we get an effective order of T0: wake_up_glock() T1: prepare_to_wait() T1: schedule() so clearly there's a way for that to happen. Any other order and either schedule() doesn't sleep or it gets woken. The only way I can see at the moment to ensure that wake_up_glock() *cannot* get called until after prepare_to_wait() is to delay it until the read_side critical sections are done, and the first place that's got that property is the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's a reason it's using call_rcu() instead. I'll keep thinking about it. > Instrumenting glock_wake_function might help though. > > Thanks, > Andreas -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-09 15:08 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith @ 2018-10-10 8:22 ` Tim Smith 2018-10-11 8:14 ` Mark Syms 0 siblings, 1 reply; 25+ messages in thread From: Tim Smith @ 2018-10-10 8:22 UTC (permalink / raw) To: cluster-devel.redhat.com On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote: > On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote: > > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote: > > > On Tuesday, 9 October 2018 13:34:47 BST you wrote: > > > > There must be another reason for the missed wake-up. I'll have to > > > > study the code some more. > > > > > > I think it needs to go into gfs2_glock_dealloc(), in such a way as to > > > avoid > > > that problem. Currently working out a patch to do that. > > > > That doesn't sound right: find_insert_glock is waiting for the glock > > to be removed from the rhashtable. In gfs2_glock_free, we remove the > > glock from the rhashtable and then we do the wake-up. Delaying the > > wakeup further isn't going to help (but it might hide the problem). > > The only way we can get the problem we're seeing is if we get an effective > order of > > T0: wake_up_glock() > T1: prepare_to_wait() > T1: schedule() > > so clearly there's a way for that to happen. Any other order and either > schedule() doesn't sleep or it gets woken. > > The only way I can see at the moment to ensure that wake_up_glock() *cannot* > get called until after prepare_to_wait() is to delay it until the read_side > critical sections are done, and the first place that's got that property is > the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu() > to gfs2_glock_free() and I'm guessing there's a reason it's using > call_rcu() instead. > > I'll keep thinking about it. OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch. -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-10 8:22 ` Tim Smith @ 2018-10-11 8:14 ` Mark Syms 2018-10-11 19:28 ` Mark Syms [not found] ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com> 0 siblings, 2 replies; 25+ messages in thread From: Mark Syms @ 2018-10-11 8:14 UTC (permalink / raw) To: cluster-devel.redhat.com And, sadly, the change to wait all also doesn't prevent the schedule_timeout from occurring. Something more subtle going on obviously. Mark. -----Original Message----- From: Tim Smith <tim.smith@citrix.com> Sent: 10 October 2018 09:23 To: cluster-devel@redhat.com Cc: Andreas Gruenbacher <agruenba@redhat.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Mark Syms <Mark.Syms@citrix.com> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote: > On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote: > > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote: > > > On Tuesday, 9 October 2018 13:34:47 BST you wrote: > > > > There must be another reason for the missed wake-up. I'll have > > > > to study the code some more. > > > > > > I think it needs to go into gfs2_glock_dealloc(), in such a way as > > > to avoid that problem. Currently working out a patch to do that. > > > > That doesn't sound right: find_insert_glock is waiting for the glock > > to be removed from the rhashtable. In gfs2_glock_free, we remove the > > glock from the rhashtable and then we do the wake-up. Delaying the > > wakeup further isn't going to help (but it might hide the problem). > > The only way we can get the problem we're seeing is if we get an > effective order of > > T0: wake_up_glock() > T1: prepare_to_wait() > T1: schedule() > > so clearly there's a way for that to happen. Any other order and > either > schedule() doesn't sleep or it gets woken. > > The only way I can see at the moment to ensure that wake_up_glock() > *cannot* get called until after prepare_to_wait() is to delay it until > the read_side critical sections are done, and the first place that's > got that property is the start of gfs2_glock_dealloc(), unless we want > to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's > a reason it's using > call_rcu() instead. > > I'll keep thinking about it. OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch. -- Tim Smith <tim.smith@citrix.com> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock 2018-10-11 8:14 ` Mark Syms @ 2018-10-11 19:28 ` Mark Syms [not found] ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com> 1 sibling, 0 replies; 25+ messages in thread From: Mark Syms @ 2018-10-11 19:28 UTC (permalink / raw) To: cluster-devel.redhat.com Also, the prepare to wait call does so in non exclusive mode so switching the parameter on the wakeup would have no effect as non exclusive waiters are always woken anyway. So, there is some other reason why we decide to yield but never get woken. Mark. ________________________________ From: Mark Syms <Mark.Syms@citrix.com> Sent: Thursday, 11 October 2018 09:14 To: Tim Smith <tim.smith@citrix.com>,cluster-devel at redhat.com CC: Andreas Gruenbacher <agruenba@redhat.com>,Ross Lagerwall <ross.lagerwall@citrix.com> Subject: RE: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock And, sadly, the change to wait all also doesn't prevent the schedule_timeout from occurring. Something more subtle going on obviously. Mark. -----Original Message----- From: Tim Smith <tim.smith@citrix.com> Sent: 10 October 2018 09:23 To: cluster-devel@redhat.com Cc: Andreas Gruenbacher <agruenba@redhat.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Mark Syms <Mark.Syms@citrix.com> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote: > On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote: > > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote: > > > On Tuesday, 9 October 2018 13:34:47 BST you wrote: > > > > There must be another reason for the missed wake-up. I'll have > > > > to study the code some more. > > > > > > I think it needs to go into gfs2_glock_dealloc(), in such a way as > > > to avoid that problem. Currently working out a patch to do that. > > > > That doesn't sound right: find_insert_glock is waiting for the glock > > to be removed from the rhashtable. In gfs2_glock_free, we remove the > > glock from the rhashtable and then we do the wake-up. Delaying the > > wakeup further isn't going to help (but it might hide the problem). > > The only way we can get the problem we're seeing is if we get an > effective order of > > T0: wake_up_glock() > T1: prepare_to_wait() > T1: schedule() > > so clearly there's a way for that to happen. Any other order and > either > schedule() doesn't sleep or it gets woken. > > The only way I can see at the moment to ensure that wake_up_glock() > *cannot* get called until after prepare_to_wait() is to delay it until > the read_side critical sections are done, and the first place that's > got that property is the start of gfs2_glock_dealloc(), unless we want > to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's > a reason it's using > call_rcu() instead. > > I'll keep thinking about it. OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch. -- Tim Smith <tim.smith@citrix.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181011/ef6af022/attachment.htm> ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>]
* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock [not found] ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com> @ 2019-03-06 16:14 ` Andreas Gruenbacher 0 siblings, 0 replies; 25+ messages in thread From: Andreas Gruenbacher @ 2019-03-06 16:14 UTC (permalink / raw) To: cluster-devel.redhat.com Mark, I think I found the problem. Let me post the fix in a separate email. Thanks, Andreas ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2019-03-06 16:14 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms 2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms 2018-10-08 12:56 ` Steven Whitehouse 2018-10-08 12:59 ` Mark Syms 2018-10-08 13:03 ` Steven Whitehouse 2018-10-08 13:10 ` Tim Smith 2018-10-08 13:13 ` Steven Whitehouse 2018-10-08 13:26 ` Tim Smith 2018-10-09 8:13 ` Mark Syms 2018-10-09 8:41 ` Steven Whitehouse 2018-10-09 8:50 ` Mark Syms 2018-10-09 12:34 ` Andreas Gruenbacher 2018-10-09 14:00 ` Tim Smith 2018-10-09 14:47 ` Andreas Gruenbacher 2018-10-09 15:34 ` Tim Smith 2018-10-08 13:25 ` Bob Peterson 2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms 2018-10-08 12:57 ` Steven Whitehouse 2018-10-08 20:10 ` Bob Peterson 2018-10-08 20:53 ` Mark Syms [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com> [not found] ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com> 2018-10-09 15:08 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith 2018-10-10 8:22 ` Tim Smith 2018-10-11 8:14 ` Mark Syms 2018-10-11 19:28 ` Mark Syms [not found] ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com> 2019-03-06 16:14 ` Andreas Gruenbacher
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.