All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/2] GFS2 locking patches
@ 2018-10-08 12:36 Mark Syms
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

A pair of locking issues in GFS2 observed when running VM storgae stress
tests.

0001-GFS2-use-schedule-timeout-in-find-insert-glock.patch covers a case
where an application level flock would wedge. The VM control plane makes
extensive use of flocks to control access to VM virtual disks and databases
and we envountered several failed tests where the flocks did not get acquired
even when noone was holding them. Investigation indicates that there is a
race in find_insert_glock where the call to schedule can be called when
the expected waiter has already completed its work. Replace schedule with
schedule_timeout and log.

0002-GFS2-Flush-the-GFS2-delete-workqueue-before-stopping.patch covers a
case where umount would wedge unrecoverably. The completion of the stress
test involves the deletion of the test machines and virtual disks followed
by the filesystem being unmounted on all hosts before the hosts are returned
to the lab pool. umount was found to wedge and this has been traced to
gfs2_log_reserve being called in the flush_workqueue but after the associated
kthread processes had been stopped. Thus there was nobody to handle the
log reserver request and the code wedged.

Mark Syms (1):
  GFS2: use schedule timeout in find insert glock

Tim Smith (1):
  GFS2: Flush the GFS2 delete workqueue before stopping the kernel
    threads

 fs/gfs2/glock.c | 3 ++-
 fs/gfs2/super.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
@ 2018-10-08 12:36 ` Mark Syms
  2018-10-08 12:56   ` Steven Whitehouse
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
       [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
  2 siblings, 1 reply; 25+ messages in thread
From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

During a VM stress test we encountered a system lockup and kern.log contained

kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000
kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418
kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
kernel: [21389.462803] Call Trace:
kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2]
kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2]
kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2]
kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2]
kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2]
kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71

On examination of the code it was determined that this code path
is only taken if the selected glock is marked as dead, the supposition
therefore is that by the time schedule as called the glock had been
cleaned up and therefore nothing woke the schedule. Instead of calling
schedule, call schedule_timeout(HZ) so at least we get a chance to
re-evaluate.

On repeating the stress test, the printk message was seen once in the logs
across four servers but no further occurences nor were there any stuck task
log entries. This indicates that when the timeout occured the code repeated
the lookup and did not find the same glock entry but as we hadn't been
woken this means that we would never have been woken.

Signed-off-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Tim Smith <tim.smith@citrix.com>
---
 fs/gfs2/glock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4614ee2..0a59a01 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
 	}
 	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
 		rcu_read_unlock();
-		schedule();
+		if (schedule_timeout(HZ) == 0)
+			printk(KERN_INFO "find_insert_glock schedule timed out\n");
 		goto again;
 	}
 out:
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
  2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
@ 2018-10-08 12:36 ` Mark Syms
  2018-10-08 12:57   ` Steven Whitehouse
  2018-10-08 20:10   ` Bob Peterson
       [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
  2 siblings, 2 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-08 12:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Tim Smith <tim.smith@citrix.com>

Flushing the workqueue can cause operations to happen which might
call gfs2_log_reserve(), or get stuck waiting for locks taken by such
operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
will never wake because the only thing which can wake it is gfs2_logd()
which was already stopped.

This causes umount of a gfs2 filesystem to wedge permanently if, for
example, the umount immediately follows a large delete operation.

When this occured, the following stack trace was obtained from the
umount command

[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Tim Smith <tim.smith@citrix.com>
Signed-off-by: Mark Syms <mark.syms@citrix.com>
---
 fs/gfs2/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index c212893..a971862 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -854,10 +854,10 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
 		return error;
 
+	flush_workqueue(gfs2_delete_workqueue);
 	kthread_stop(sdp->sd_quotad_process);
 	kthread_stop(sdp->sd_logd_process);
 
-	flush_workqueue(gfs2_delete_workqueue);
 	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
@ 2018-10-08 12:56   ` Steven Whitehouse
  2018-10-08 12:59     ` Mark Syms
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Whitehouse @ 2018-10-08 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 08/10/18 13:36, Mark Syms wrote:
> During a VM stress test we encountered a system lockup and kern.log contained
>
> kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
> kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
> kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
> kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 ffff88005a509c00 ffff88019628c000
> kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 fffffffffffffe58 ffff8801818da418
> kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 ffff8801818da5c0 ffff88019628bd68
> kernel: [21389.462803] Call Trace:
> kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
> kernel: [21389.462877] [<ffffffffa0663624>] find_insert_glock+0x4a4/0x530 [gfs2]
> kernel: [21389.462891] [<ffffffffa0660c20>] ? gfs2_holder_wake+0x20/0x20 [gfs2]
> kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 [gfs2]
> kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
> kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 [gfs2]
> kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
> kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 [gfs2]
> kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
> kernel: [21389.462948] [<ffffffff815a57ee>] entry_SYSCALL_64_fastpath+0x12/0x71
>
> On examination of the code it was determined that this code path
> is only taken if the selected glock is marked as dead, the supposition
> therefore is that by the time schedule as called the glock had been
> cleaned up and therefore nothing woke the schedule. Instead of calling
> schedule, call schedule_timeout(HZ) so at least we get a chance to
> re-evaluate.
>
> On repeating the stress test, the printk message was seen once in the logs
> across four servers but no further occurences nor were there any stuck task
> log entries. This indicates that when the timeout occured the code repeated
> the lookup and did not find the same glock entry but as we hadn't been
> woken this means that we would never have been woken.
>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> ---
>   fs/gfs2/glock.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4614ee2..0a59a01 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
>   	}
>   	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
>   		rcu_read_unlock();
> -		schedule();
> +		if (schedule_timeout(HZ) == 0)
> +			printk(KERN_INFO "find_insert_glock schedule timed out\n");
>   		goto again;
>   	}
>   out:

That is a bit odd. In fact that whole function looks odd. I wonder why 
it needs to wait in the first place. It should be a simple comparison 
here. If the glock is dead then nothing else should touch it, so we are 
safe to add a new one into the hash table. The wait is almost certainly 
a bug,

Steve.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
@ 2018-10-08 12:57   ` Steven Whitehouse
  2018-10-08 20:10   ` Bob Peterson
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Whitehouse @ 2018-10-08 12:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 08/10/18 13:36, Mark Syms wrote:
> From: Tim Smith <tim.smith@citrix.com>
>
> Flushing the workqueue can cause operations to happen which might
> call gfs2_log_reserve(), or get stuck waiting for locks taken by such
> operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
> will never wake because the only thing which can wake it is gfs2_logd()
> which was already stopped.
>
> This causes umount of a gfs2 filesystem to wedge permanently if, for
> example, the umount immediately follows a large delete operation.
>
> When this occured, the following stack trace was obtained from the
> umount command
>
> [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
> [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
> [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
> [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
> [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
> [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
> [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
> [<ffffffff811b79b9>] deactivate_super+0x59/0x60
> [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
> [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
> [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
> [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
> [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
> [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
> [<ffffffffffffffff>] 0xffffffffffffffff
Good spotting! Definitely need to fix this :-)

Steve.

> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> ---
>   fs/gfs2/super.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index c212893..a971862 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -854,10 +854,10 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
>   	if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
>   		return error;
>   
> +	flush_workqueue(gfs2_delete_workqueue);
>   	kthread_stop(sdp->sd_quotad_process);
>   	kthread_stop(sdp->sd_logd_process);
>   
> -	flush_workqueue(gfs2_delete_workqueue);
>   	gfs2_quota_sync(sdp->sd_vfs, 0);
>   	gfs2_statfs_sync(sdp->sd_vfs, 0);
>   



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 12:56   ` Steven Whitehouse
@ 2018-10-08 12:59     ` Mark Syms
  2018-10-08 13:03       ` Steven Whitehouse
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Syms @ 2018-10-08 12:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

That sounds entirely reasonable so long as you are absolutely sure that nothing is ever going to mess with that glock, we erred on the side of more caution not knowing whether it would be guaranteed safe or not.

Thanks,

	Mark

-----Original Message-----
From: Steven Whitehouse <swhiteho@redhat.com> 
Sent: 08 October 2018 13:56
To: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>; Tim Smith <tim.smith@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

Hi,


On 08/10/18 13:36, Mark Syms wrote:
> During a VM stress test we encountered a system lockup and kern.log 
> contained
>
> kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
> kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
> kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
> kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00 
> ffff88005a509c00 ffff88019628c000
> kernel: [21389.462795] ffffc90040226000 ffff88019628bd80 
> fffffffffffffe58 ffff8801818da418
> kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4 
> ffff8801818da5c0 ffff88019628bd68
> kernel: [21389.462803] Call Trace:
> kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
> kernel: [21389.462877] [<ffffffffa0663624>] 
> find_insert_glock+0x4a4/0x530 [gfs2]
> kernel: [21389.462891] [<ffffffffa0660c20>] ? 
> gfs2_holder_wake+0x20/0x20 [gfs2]
> kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330 
> [gfs2]
> kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
> kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0 
> [gfs2]
> kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
> kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0 
> [gfs2]
> kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
> kernel: [21389.462948] [<ffffffff815a57ee>] 
> entry_SYSCALL_64_fastpath+0x12/0x71
>
> On examination of the code it was determined that this code path is 
> only taken if the selected glock is marked as dead, the supposition 
> therefore is that by the time schedule as called the glock had been 
> cleaned up and therefore nothing woke the schedule. Instead of calling 
> schedule, call schedule_timeout(HZ) so at least we get a chance to 
> re-evaluate.
>
> On repeating the stress test, the printk message was seen once in the 
> logs across four servers but no further occurences nor were there any 
> stuck task log entries. This indicates that when the timeout occured 
> the code repeated the lookup and did not find the same glock entry but 
> as we hadn't been woken this means that we would never have been woken.
>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> ---
>   fs/gfs2/glock.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01 
> 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
>   	}
>   	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
>   		rcu_read_unlock();
> -		schedule();
> +		if (schedule_timeout(HZ) == 0)
> +			printk(KERN_INFO "find_insert_glock schedule timed out\n");
>   		goto again;
>   	}
>   out:

That is a bit odd. In fact that whole function looks odd. I wonder why it needs to wait in the first place. It should be a simple comparison here. If the glock is dead then nothing else should touch it, so we are safe to add a new one into the hash table. The wait is almost certainly a bug,

Steve.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 12:59     ` Mark Syms
@ 2018-10-08 13:03       ` Steven Whitehouse
  2018-10-08 13:10         ` Tim Smith
  2018-10-08 13:25         ` Bob Peterson
  0 siblings, 2 replies; 25+ messages in thread
From: Steven Whitehouse @ 2018-10-08 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 08/10/18 13:59, Mark Syms wrote:
> That sounds entirely reasonable so long as you are absolutely sure that nothing is ever going to mess with that glock, we erred on the side of more caution not knowing whether it would be guaranteed safe or not.
>
> Thanks,
>
> 	Mark
We should have a look at the history to see how that wait got added. 
However the "dead" flag here means "don't touch this glock" and is there 
so that we can separate the marking dead from the actual removal from 
the list (which simplifies the locking during the scanning procedures)

Steve.

> -----Original Message-----
> From: Steven Whitehouse <swhiteho@redhat.com>
> Sent: 08 October 2018 13:56
> To: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com
> Cc: Ross Lagerwall <ross.lagerwall@citrix.com>; Tim Smith <tim.smith@citrix.com>
> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
>
> Hi,
>
>
> On 08/10/18 13:36, Mark Syms wrote:
>> During a VM stress test we encountered a system lockup and kern.log
>> contained
>>
>> kernel: [21389.462707] INFO: task python:15480 blocked for more than 120 seconds.
>> kernel: [21389.462749] Tainted: G O 4.4.0+10 #1
>> kernel: [21389.462763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kernel: [21389.462783] python D ffff88019628bc90 0 15480 1 0x00000000
>> kernel: [21389.462790] ffff88019628bc90 ffff880198f11c00
>> ffff88005a509c00 ffff88019628c000
>> kernel: [21389.462795] ffffc90040226000 ffff88019628bd80
>> fffffffffffffe58 ffff8801818da418
>> kernel: [21389.462799] ffff88019628bca8 ffffffff815a1cd4
>> ffff8801818da5c0 ffff88019628bd68
>> kernel: [21389.462803] Call Trace:
>> kernel: [21389.462815] [<ffffffff815a1cd4>] schedule+0x64/0x80
>> kernel: [21389.462877] [<ffffffffa0663624>]
>> find_insert_glock+0x4a4/0x530 [gfs2]
>> kernel: [21389.462891] [<ffffffffa0660c20>] ?
>> gfs2_holder_wake+0x20/0x20 [gfs2]
>> kernel: [21389.462903] [<ffffffffa06639ed>] gfs2_glock_get+0x3d/0x330
>> [gfs2]
>> kernel: [21389.462928] [<ffffffffa066cff2>] do_flock+0xf2/0x210 [gfs2]
>> kernel: [21389.462933] [<ffffffffa0671ad0>] ? gfs2_getattr+0xe0/0xf0
>> [gfs2]
>> kernel: [21389.462938] [<ffffffff811ba2fb>] ? cp_new_stat+0x10b/0x120
>> kernel: [21389.462943] [<ffffffffa066d188>] gfs2_flock+0x78/0xa0
>> [gfs2]
>> kernel: [21389.462946] [<ffffffff812021e9>] SyS_flock+0x129/0x170
>> kernel: [21389.462948] [<ffffffff815a57ee>]
>> entry_SYSCALL_64_fastpath+0x12/0x71
>>
>> On examination of the code it was determined that this code path is
>> only taken if the selected glock is marked as dead, the supposition
>> therefore is that by the time schedule as called the glock had been
>> cleaned up and therefore nothing woke the schedule. Instead of calling
>> schedule, call schedule_timeout(HZ) so at least we get a chance to
>> re-evaluate.
>>
>> On repeating the stress test, the printk message was seen once in the
>> logs across four servers but no further occurences nor were there any
>> stuck task log entries. This indicates that when the timeout occured
>> the code repeated the lookup and did not find the same glock entry but
>> as we hadn't been woken this means that we would never have been woken.
>>
>> Signed-off-by: Mark Syms <mark.syms@citrix.com>
>> Signed-off-by: Tim Smith <tim.smith@citrix.com>
>> ---
>>    fs/gfs2/glock.c | 3 ++-
>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 4614ee2..0a59a01
>> 100644
>> --- a/fs/gfs2/glock.c
>> +++ b/fs/gfs2/glock.c
>> @@ -758,7 +758,8 @@ static struct gfs2_glock *find_insert_glock(struct lm_lockname *name,
>>    	}
>>    	if (gl && !lockref_get_not_dead(&gl->gl_lockref)) {
>>    		rcu_read_unlock();
>> -		schedule();
>> +		if (schedule_timeout(HZ) == 0)
>> +			printk(KERN_INFO "find_insert_glock schedule timed out\n");
>>    		goto again;
>>    	}
>>    out:
> That is a bit odd. In fact that whole function looks odd. I wonder why it needs to wait in the first place. It should be a simple comparison here. If the glock is dead then nothing else should touch it, so we are safe to add a new one into the hash table. The wait is almost certainly a bug,
>
> Steve.
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:03       ` Steven Whitehouse
@ 2018-10-08 13:10         ` Tim Smith
  2018-10-08 13:13           ` Steven Whitehouse
  2018-10-08 13:25         ` Bob Peterson
  1 sibling, 1 reply; 25+ messages in thread
From: Tim Smith @ 2018-10-08 13:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> On 08/10/18 13:59, Mark Syms wrote:
> > That sounds entirely reasonable so long as you are absolutely sure that
> > nothing is ever going to mess with that glock, we erred on the side of
> > more caution not knowing whether it would be guaranteed safe or not.
> > 
> > Thanks,
> > 
> > 	Mark
> 
> We should have a look at the history to see how that wait got added.
> However the "dead" flag here means "don't touch this glock" and is there
> so that we can separate the marking dead from the actual removal from
> the list (which simplifies the locking during the scanning procedures)

You beat me to it :-)

I think there might be a bit of a problem inserting a new entry with the same 
name before the old entry has been fully destroyed (or at least removed), 
which would be why the schedule() is there.

-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:10         ` Tim Smith
@ 2018-10-08 13:13           ` Steven Whitehouse
  2018-10-08 13:26             ` Tim Smith
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Whitehouse @ 2018-10-08 13:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 08/10/18 14:10, Tim Smith wrote:
> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
>> On 08/10/18 13:59, Mark Syms wrote:
>>> That sounds entirely reasonable so long as you are absolutely sure that
>>> nothing is ever going to mess with that glock, we erred on the side of
>>> more caution not knowing whether it would be guaranteed safe or not.
>>>
>>> Thanks,
>>>
>>> 	Mark
>> We should have a look at the history to see how that wait got added.
>> However the "dead" flag here means "don't touch this glock" and is there
>> so that we can separate the marking dead from the actual removal from
>> the list (which simplifies the locking during the scanning procedures)
> You beat me to it :-)
>
> I think there might be a bit of a problem inserting a new entry with the same
> name before the old entry has been fully destroyed (or at least removed),
> which would be why the schedule() is there.
>
If the old entry is marked dead, all future lookups should ignore it. We 
should only have a single non-dead entry at a time, but that doesn't 
seem like it should need us to wait for it.

If we do discover that the wait is really required, then it sounds like 
as you mentioned above there is a lost wakeup, and that must presumably 
be on a code path that sets the dead flag and then fails to send a wake 
up later on. If we can drop the wait in the first place, that seems like 
a better plan,

Steve.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:03       ` Steven Whitehouse
  2018-10-08 13:10         ` Tim Smith
@ 2018-10-08 13:25         ` Bob Peterson
  1 sibling, 0 replies; 25+ messages in thread
From: Bob Peterson @ 2018-10-08 13:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> 
> 
> On 08/10/18 13:59, Mark Syms wrote:
> > That sounds entirely reasonable so long as you are absolutely sure that
> > nothing is ever going to mess with that glock, we erred on the side of
> > more caution not knowing whether it would be guaranteed safe or not.
> >
> > Thanks,
> >
> > 	Mark
> We should have a look at the history to see how that wait got added.
> However the "dead" flag here means "don't touch this glock" and is there
> so that we can separate the marking dead from the actual removal from
> the list (which simplifies the locking during the scanning procedures)
> 
> Steve.
(snip)
> > That is a bit odd. In fact that whole function looks odd. I wonder why it
> > needs to wait in the first place. It should be a simple comparison here.
> > If the glock is dead then nothing else should touch it, so we are safe to
> > add a new one into the hash table. The wait is almost certainly a bug,
> >
> > Steve.

Hi,

Andreas and I both did a ton of work here trying to get this right, and it
was all done for the problems we had with transitioning dinodes from
unlinked to free, and how the glocks beneath them were competing by way
of the in-core inodes. The glocks typically outlive the inodes, but we
had tons of problems with inodes coming and going, their associated glocks
coming and going, and being marked dead, and the rcus underneath them.

The problem is that inodes are coming and going, some with I_FREE, or
I_WILL_FREE. Glocks are also coming and going, often for the same block
at the same time, but two different inodes for two different files, and
both inode and iopen glocks.

The problems stem from remote unlinks causing "delete work" while the
inodes and glocks are both being allocated and freed in rapid succession.

We encountered lots of hangs and use-after-free problems. My  point is not
that we shouldn't fix it, but merely that we need to be VERY careful here
not to reintroduce one of the countless problems we fixed. Mark's original
patch seems pretty low risk to me, but if we go that route, I'd like to
see a smaller timeout; 1HZ seems like a very long time. Better still to
see if there's a better fix that doesn't break anything.

Bob Peterson



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:13           ` Steven Whitehouse
@ 2018-10-08 13:26             ` Tim Smith
  2018-10-09  8:13               ` Mark Syms
  2018-10-09 12:34               ` Andreas Gruenbacher
  0 siblings, 2 replies; 25+ messages in thread
From: Tim Smith @ 2018-10-08 13:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> Hi,
> 
> On 08/10/18 14:10, Tim Smith wrote:
> > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> >> On 08/10/18 13:59, Mark Syms wrote:
> >>> That sounds entirely reasonable so long as you are absolutely sure that
> >>> nothing is ever going to mess with that glock, we erred on the side of
> >>> more caution not knowing whether it would be guaranteed safe or not.
> >>> 
> >>> Thanks,
> >>> 
> >>> 	Mark
> >> 
> >> We should have a look at the history to see how that wait got added.
> >> However the "dead" flag here means "don't touch this glock" and is there
> >> so that we can separate the marking dead from the actual removal from
> >> the list (which simplifies the locking during the scanning procedures)
> > 
> > You beat me to it :-)
> > 
> > I think there might be a bit of a problem inserting a new entry with the
> > same name before the old entry has been fully destroyed (or at least
> > removed), which would be why the schedule() is there.
> 
> If the old entry is marked dead, all future lookups should ignore it. We
> should only have a single non-dead entry at a time, but that doesn't
> seem like it should need us to wait for it.

On the second call we do have the new glock to insert as arg2, so we could try 
to swap them cleanly, yeah.

> If we do discover that the wait is really required, then it sounds like
> as you mentioned above there is a lost wakeup, and that must presumably
> be on a code path that sets the dead flag and then fails to send a wake
> up later on. If we can drop the wait in the first place, that seems like
> a better plan,

Ooooh, I wonder if these two lines:

	wake_up_glock(gl);
	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);

in gfs2_glock_free() are the wrong way round?

-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
  2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
  2018-10-08 12:57   ` Steven Whitehouse
@ 2018-10-08 20:10   ` Bob Peterson
  2018-10-08 20:53     ` Mark Syms
  1 sibling, 1 reply; 25+ messages in thread
From: Bob Peterson @ 2018-10-08 20:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> From: Tim Smith <tim.smith@citrix.com>
> 
> Flushing the workqueue can cause operations to happen which might
> call gfs2_log_reserve(), or get stuck waiting for locks taken by such
> operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
> will never wake because the only thing which can wake it is gfs2_logd()
> which was already stopped.
> 
> This causes umount of a gfs2 filesystem to wedge permanently if, for
> example, the umount immediately follows a large delete operation.
> 
> When this occured, the following stack trace was obtained from the
> umount command
> 
> [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
> [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
> [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
> [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
> [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
> [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
> [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
> [<ffffffff811b79b9>] deactivate_super+0x59/0x60
> [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
> [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
> [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
> [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
> [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
> [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> ---
Hi Mark, Tim, and all,

I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it
on my queue, pending our investigation.
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9

Regards,

Bob Peterson



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
  2018-10-08 20:10   ` Bob Peterson
@ 2018-10-08 20:53     ` Mark Syms
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-08 20:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Thanks Bob,

We're having a look at the bit that Tim sent earlier to see if switching the call order makes the schedule_timeout never occur.

As Steven said, it would be nice if we didn't have to worry about waiting for "dead" glocks to be cleaned up but that would require some considerable care to ensure that we don't just get the opposite race and get a newly created glock freed and deleted.

Mark
________________________________
From: Bob Peterson <rpeterso@redhat.com>
Sent: Monday, 8 October 2018 21:10
To: Mark Syms <Mark.Syms@citrix.com>
CC: cluster-devel at redhat.com,Ross Lagerwall <ross.lagerwall@citrix.com>,Tim Smith <tim.smith@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads


----- Original Message -----
> From: Tim Smith <tim.smith@citrix.com>
>
> Flushing the workqueue can cause operations to happen which might
> call gfs2_log_reserve(), or get stuck waiting for locks taken by such
> operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
> will never wake because the only thing which can wake it is gfs2_logd()
> which was already stopped.
>
> This causes umount of a gfs2 filesystem to wedge permanently if, for
> example, the umount immediately follows a large delete operation.
>
> When this occured, the following stack trace was obtained from the
> umount command
>
> [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
> [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
> [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
> [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
> [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
> [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
> [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
> [<ffffffff811b79b9>] deactivate_super+0x59/0x60
> [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
> [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
> [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
> [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
> [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
> [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Tim Smith <tim.smith@citrix.com>
> Signed-off-by: Mark Syms <mark.syms@citrix.com>
> ---
Hi Mark, Tim, and all,

I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it
on my queue, pending our investigation.
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9

Regards,

Bob Peterson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181008/a5fe00da/attachment.htm>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:26             ` Tim Smith
@ 2018-10-09  8:13               ` Mark Syms
  2018-10-09  8:41                 ` Steven Whitehouse
  2018-10-09 12:34               ` Andreas Gruenbacher
  1 sibling, 1 reply; 25+ messages in thread
From: Mark Syms @ 2018-10-09  8:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Having swapped the line below around we still see the timeout on schedule fire, but only once in
a fairly mega stress test. This is why we weren't worried about the timeout being HZ, the situation
is hardly ever hit as having to wait is rare and normally we are woken from schedule and without
a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't
seem too bad.

Mark.

-----Original Message-----
From: Tim Smith <tim.smith@citrix.com> 
Sent: 08 October 2018 14:27
To: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> Hi,
> 
> On 08/10/18 14:10, Tim Smith wrote:
> > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> >> On 08/10/18 13:59, Mark Syms wrote:
> >>> That sounds entirely reasonable so long as you are absolutely sure 
> >>> that nothing is ever going to mess with that glock, we erred on 
> >>> the side of more caution not knowing whether it would be guaranteed safe or not.
> >>> 
> >>> Thanks,
> >>> 
> >>> 	Mark
> >> 
> >> We should have a look at the history to see how that wait got added.
> >> However the "dead" flag here means "don't touch this glock" and is 
> >> there so that we can separate the marking dead from the actual 
> >> removal from the list (which simplifies the locking during the 
> >> scanning procedures)
> > 
> > You beat me to it :-)
> > 
> > I think there might be a bit of a problem inserting a new entry with 
> > the same name before the old entry has been fully destroyed (or at 
> > least removed), which would be why the schedule() is there.
> 
> If the old entry is marked dead, all future lookups should ignore it. 
> We should only have a single non-dead entry at a time, but that 
> doesn't seem like it should need us to wait for it.

On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah.

> If we do discover that the wait is really required, then it sounds 
> like as you mentioned above there is a lost wakeup, and that must 
> presumably be on a code path that sets the dead flag and then fails to 
> send a wake up later on. If we can drop the wait in the first place, 
> that seems like a better plan,

Ooooh, I wonder if these two lines:

	wake_up_glock(gl);
	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);

in gfs2_glock_free() are the wrong way round?

--
Tim Smith <tim.smith@citrix.com>





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09  8:13               ` Mark Syms
@ 2018-10-09  8:41                 ` Steven Whitehouse
  2018-10-09  8:50                   ` Mark Syms
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Whitehouse @ 2018-10-09  8:41 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 09/10/18 09:13, Mark Syms wrote:
> Having swapped the line below around we still see the timeout on schedule fire, but only once in
> a fairly mega stress test. This is why we weren't worried about the timeout being HZ, the situation
> is hardly ever hit as having to wait is rare and normally we are woken from schedule and without
> a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't
> seem too bad.
>
> Mark.
We should still get to the bottom of why the wake up is missing though, 
since without that fix we won't know if there is something else wrong 
somewhere,

Steve.

>
> -----Original Message-----
> From: Tim Smith <tim.smith@citrix.com>
> Sent: 08 October 2018 14:27
> To: Steven Whitehouse <swhiteho@redhat.com>
> Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com>
> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
>> Hi,
>>
>> On 08/10/18 14:10, Tim Smith wrote:
>>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
>>>> On 08/10/18 13:59, Mark Syms wrote:
>>>>> That sounds entirely reasonable so long as you are absolutely sure
>>>>> that nothing is ever going to mess with that glock, we erred on
>>>>> the side of more caution not knowing whether it would be guaranteed safe or not.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	Mark
>>>> We should have a look at the history to see how that wait got added.
>>>> However the "dead" flag here means "don't touch this glock" and is
>>>> there so that we can separate the marking dead from the actual
>>>> removal from the list (which simplifies the locking during the
>>>> scanning procedures)
>>> You beat me to it :-)
>>>
>>> I think there might be a bit of a problem inserting a new entry with
>>> the same name before the old entry has been fully destroyed (or at
>>> least removed), which would be why the schedule() is there.
>> If the old entry is marked dead, all future lookups should ignore it.
>> We should only have a single non-dead entry at a time, but that
>> doesn't seem like it should need us to wait for it.
> On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah.
>
>> If we do discover that the wait is really required, then it sounds
>> like as you mentioned above there is a lost wakeup, and that must
>> presumably be on a code path that sets the dead flag and then fails to
>> send a wake up later on. If we can drop the wait in the first place,
>> that seems like a better plan,
> Ooooh, I wonder if these two lines:
>
> 	wake_up_glock(gl);
> 	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?
>
> --
> Tim Smith <tim.smith@citrix.com>
>
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09  8:41                 ` Steven Whitehouse
@ 2018-10-09  8:50                   ` Mark Syms
  0 siblings, 0 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-09  8:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

We think we have, just making a build to test.

Will follow up later.

Mark.

-----Original Message-----
From: Steven Whitehouse <swhiteho@redhat.com> 
Sent: 09 October 2018 09:41
To: Mark Syms <Mark.Syms@citrix.com>; Tim Smith <tim.smith@citrix.com>
Cc: cluster-devel at redhat.com; Ross Lagerwall <ross.lagerwall@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock



On 09/10/18 09:13, Mark Syms wrote:
> Having swapped the line below around we still see the timeout on 
> schedule fire, but only once in a fairly mega stress test. This is why 
> we weren't worried about the timeout being HZ, the situation is hardly 
> ever hit as having to wait is rare and normally we are woken from 
> schedule and without a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't seem too bad.
>
> Mark.
We should still get to the bottom of why the wake up is missing though, since without that fix we won't know if there is something else wrong somewhere,

Steve.

>
> -----Original Message-----
> From: Tim Smith <tim.smith@citrix.com>
> Sent: 08 October 2018 14:27
> To: Steven Whitehouse <swhiteho@redhat.com>
> Cc: Mark Syms <Mark.Syms@citrix.com>; cluster-devel at redhat.com; Ross 
> Lagerwall <ross.lagerwall@citrix.com>
> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in 
> find insert glock
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
>> Hi,
>>
>> On 08/10/18 14:10, Tim Smith wrote:
>>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
>>>> On 08/10/18 13:59, Mark Syms wrote:
>>>>> That sounds entirely reasonable so long as you are absolutely sure 
>>>>> that nothing is ever going to mess with that glock, we erred on 
>>>>> the side of more caution not knowing whether it would be guaranteed safe or not.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	Mark
>>>> We should have a look at the history to see how that wait got added.
>>>> However the "dead" flag here means "don't touch this glock" and is 
>>>> there so that we can separate the marking dead from the actual 
>>>> removal from the list (which simplifies the locking during the 
>>>> scanning procedures)
>>> You beat me to it :-)
>>>
>>> I think there might be a bit of a problem inserting a new entry with 
>>> the same name before the old entry has been fully destroyed (or at 
>>> least removed), which would be why the schedule() is there.
>> If the old entry is marked dead, all future lookups should ignore it.
>> We should only have a single non-dead entry at a time, but that 
>> doesn't seem like it should need us to wait for it.
> On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah.
>
>> If we do discover that the wait is really required, then it sounds 
>> like as you mentioned above there is a lost wakeup, and that must 
>> presumably be on a code path that sets the dead flag and then fails 
>> to send a wake up later on. If we can drop the wait in the first 
>> place, that seems like a better plan,
> Ooooh, I wonder if these two lines:
>
> 	wake_up_glock(gl);
> 	call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?
>
> --
> Tim Smith <tim.smith@citrix.com>
>
>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-08 13:26             ` Tim Smith
  2018-10-09  8:13               ` Mark Syms
@ 2018-10-09 12:34               ` Andreas Gruenbacher
  2018-10-09 14:00                 ` Tim Smith
  1 sibling, 1 reply; 25+ messages in thread
From: Andreas Gruenbacher @ 2018-10-09 12:34 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote:
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > Hi,
> >
> > On 08/10/18 14:10, Tim Smith wrote:
> > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > >> On 08/10/18 13:59, Mark Syms wrote:
> > >>> That sounds entirely reasonable so long as you are absolutely sure that
> > >>> nothing is ever going to mess with that glock, we erred on the side of
> > >>> more caution not knowing whether it would be guaranteed safe or not.
> > >>>
> > >>> Thanks,
> > >>>
> > >>>   Mark
> > >>
> > >> We should have a look at the history to see how that wait got added.
> > >> However the "dead" flag here means "don't touch this glock" and is there
> > >> so that we can separate the marking dead from the actual removal from
> > >> the list (which simplifies the locking during the scanning procedures)
> > >
> > > You beat me to it :-)
> > >
> > > I think there might be a bit of a problem inserting a new entry with the
> > > same name before the old entry has been fully destroyed (or at least
> > > removed), which would be why the schedule() is there.
> >
> > If the old entry is marked dead, all future lookups should ignore it. We
> > should only have a single non-dead entry at a time, but that doesn't
> > seem like it should need us to wait for it.
>
> On the second call we do have the new glock to insert as arg2, so we could try
> to swap them cleanly, yeah.
>
> > If we do discover that the wait is really required, then it sounds like
> > as you mentioned above there is a lost wakeup, and that must presumably
> > be on a code path that sets the dead flag and then fails to send a wake
> > up later on. If we can drop the wait in the first place, that seems like
> > a better plan,
>
> Ooooh, I wonder if these two lines:
>
>         wake_up_glock(gl);
>         call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?

No, what's important here is that the wake-up happens after the glock
being freed has been removed from the rhashtable, and that's the case
here. wake_up_glock actually accesses the glock, which may no longer
be around after the call_rcu, so swapping the two lines would
introduce a use-after-free bug.

There must be another reason for the missed wake-up. I'll have to
study the code some more.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09 12:34               ` Andreas Gruenbacher
@ 2018-10-09 14:00                 ` Tim Smith
  2018-10-09 14:47                   ` Andreas Gruenbacher
  0 siblings, 1 reply; 25+ messages in thread
From: Tim Smith @ 2018-10-09 14:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote:
> > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > On 08/10/18 14:10, Tim Smith wrote:
> > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > >>> That sounds entirely reasonable so long as you are absolutely sure
> > > >>> that
> > > >>> nothing is ever going to mess with that glock, we erred on the side
> > > >>> of
> > > >>> more caution not knowing whether it would be guaranteed safe or not.
> > > >>> 
> > > >>> Thanks,
> > > >>> 
> > > >>>   Mark
> > > >> 
> > > >> We should have a look at the history to see how that wait got added.
> > > >> However the "dead" flag here means "don't touch this glock" and is
> > > >> there
> > > >> so that we can separate the marking dead from the actual removal from
> > > >> the list (which simplifies the locking during the scanning
> > > >> procedures)
> > > > 
> > > > You beat me to it :-)
> > > > 
> > > > I think there might be a bit of a problem inserting a new entry with
> > > > the
> > > > same name before the old entry has been fully destroyed (or at least
> > > > removed), which would be why the schedule() is there.
> > > 
> > > If the old entry is marked dead, all future lookups should ignore it. We
> > > should only have a single non-dead entry at a time, but that doesn't
> > > seem like it should need us to wait for it.
> > 
> > On the second call we do have the new glock to insert as arg2, so we could
> > try to swap them cleanly, yeah.
> > 
> > > If we do discover that the wait is really required, then it sounds like
> > > as you mentioned above there is a lost wakeup, and that must presumably
> > > be on a code path that sets the dead flag and then fails to send a wake
> > > up later on. If we can drop the wait in the first place, that seems like
> > > a better plan,
> > 
> > Ooooh, I wonder if these two lines:
> >         wake_up_glock(gl);
> >         call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > 
> > in gfs2_glock_free() are the wrong way round?
> 
> No, what's important here is that the wake-up happens after the glock
> being freed has been removed from the rhashtable, and that's the case
> here. wake_up_glock actually accesses the glock, which may no longer
> be around after the call_rcu, so swapping the two lines would
> introduce a use-after-free bug.

Yes, realised that :-/

> There must be another reason for the missed wake-up. I'll have to
> study the code some more.

I think it needs to go into gfs2_glock_dealloc(), in such a way as to avoid 
that problem. Currently working out a patch to do that.

> Thanks,
> Andreas

-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09 14:00                 ` Tim Smith
@ 2018-10-09 14:47                   ` Andreas Gruenbacher
  2018-10-09 15:34                     ` Tim Smith
  0 siblings, 1 reply; 25+ messages in thread
From: Andreas Gruenbacher @ 2018-10-09 14:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Mark and Tim,

does the following patch fix the problem, perhaps?

Thanks,
Andreas

---
 fs/gfs2/glock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4614ee25f621..71e7c380d4c4 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl)
 	wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name);
 
 	if (waitqueue_active(wq))
-		__wake_up(wq, TASK_NORMAL, 1, &gl->gl_name);
+		__wake_up(wq, TASK_NORMAL, 0, &gl->gl_name);
 }
 
 static void gfs2_glock_dealloc(struct rcu_head *rcu)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
       [not found]   ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>
@ 2018-10-09 15:08     ` Tim Smith
  2018-10-10  8:22       ` Tim Smith
  0 siblings, 1 reply; 25+ messages in thread
From: Tim Smith @ 2018-10-09 15:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote:
> > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote:
> > > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > > > Hi,
> > > > > 
> > > > > On 08/10/18 14:10, Tim Smith wrote:
> > > > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > > > >>> That sounds entirely reasonable so long as you are absolutely
> > > > > >>> sure
> > > > > >>> that
> > > > > >>> nothing is ever going to mess with that glock, we erred on the
> > > > > >>> side
> > > > > >>> of
> > > > > >>> more caution not knowing whether it would be guaranteed safe or
> > > > > >>> not.
> > > > > >>> 
> > > > > >>> Thanks,
> > > > > >>> 
> > > > > >>>   Mark
> > > > > >> 
> > > > > >> We should have a look at the history to see how that wait got
> > > > > >> added.
> > > > > >> However the "dead" flag here means "don't touch this glock" and
> > > > > >> is
> > > > > >> there
> > > > > >> so that we can separate the marking dead from the actual removal
> > > > > >> from
> > > > > >> the list (which simplifies the locking during the scanning
> > > > > >> procedures)
> > > > > > 
> > > > > > You beat me to it :-)
> > > > > > 
> > > > > > I think there might be a bit of a problem inserting a new entry
> > > > > > with
> > > > > > the
> > > > > > same name before the old entry has been fully destroyed (or at
> > > > > > least
> > > > > > removed), which would be why the schedule() is there.
> > > > > 
> > > > > If the old entry is marked dead, all future lookups should ignore
> > > > > it. We
> > > > > should only have a single non-dead entry at a time, but that doesn't
> > > > > seem like it should need us to wait for it.
> > > > 
> > > > On the second call we do have the new glock to insert as arg2, so we
> > > > could
> > > > try to swap them cleanly, yeah.
> > > > 
> > > > > If we do discover that the wait is really required, then it sounds
> > > > > like
> > > > > as you mentioned above there is a lost wakeup, and that must
> > > > > presumably
> > > > > be on a code path that sets the dead flag and then fails to send a
> > > > > wake
> > > > > up later on. If we can drop the wait in the first place, that seems
> > > > > like
> > > > > a better plan,
> > > > 
> > > > Ooooh, I wonder if these two lines:
> > > >         wake_up_glock(gl);
> > > >         call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > > > 
> > > > in gfs2_glock_free() are the wrong way round?
> > > 
> > > No, what's important here is that the wake-up happens after the glock
> > > being freed has been removed from the rhashtable, and that's the case
> > > here. wake_up_glock actually accesses the glock, which may no longer
> > > be around after the call_rcu, so swapping the two lines would
> > > introduce a use-after-free bug.
> > 
> > Yes, realised that :-/
> > 
> > > There must be another reason for the missed wake-up. I'll have to
> > > study the code some more.
> > 
> > I think it needs to go into gfs2_glock_dealloc(), in such a way as to
> > avoid
> > that problem. Currently working out a patch to do that.
> 
> That doesn't sound right: find_insert_glock is waiting for the glock
> to be removed from the rhashtable. In gfs2_glock_free, we remove the
> glock from the rhashtable and then we do the wake-up. Delaying the
> wakeup further isn't going to help (but it might hide the problem).

The only way we can get the problem we're seeing is if we get an effective 
order of

T0: wake_up_glock()
T1: prepare_to_wait()
T1: schedule()

so clearly there's a way for that to happen. Any other order and either 
schedule() doesn't sleep or it gets woken.

The only way I can see at the moment to ensure that wake_up_glock() *cannot* 
get called until after prepare_to_wait() is to delay it until the read_side 
critical sections are done, and the first place that's got that property is 
the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu() to 
gfs2_glock_free() and I'm guessing there's a reason it's using call_rcu() 
instead.

I'll keep thinking about it.

> Instrumenting glock_wake_function might help though.
> 
> Thanks,
> Andreas


-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09 14:47                   ` Andreas Gruenbacher
@ 2018-10-09 15:34                     ` Tim Smith
  0 siblings, 0 replies; 25+ messages in thread
From: Tim Smith @ 2018-10-09 15:34 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday, 9 October 2018 15:47:21 BST Andreas Gruenbacher wrote:
> Mark and Tim,
> 
> does the following patch fix the problem, perhaps?

The assertion being that there are several waiters. Certainly possible.

We'll give it a try. It takes ~12 hours to hit one instance of this so we've 
got plenty of thinking time between runs.

> Thanks,
> Andreas
> 
> ---
>  fs/gfs2/glock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4614ee25f621..71e7c380d4c4 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl)
>  	wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name);
> 
>  	if (waitqueue_active(wq))
> -		__wake_up(wq, TASK_NORMAL, 1, &gl->gl_name);
> +		__wake_up(wq, TASK_NORMAL, 0, &gl->gl_name);
>  }
> 
>  static void gfs2_glock_dealloc(struct rcu_head *rcu)


-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-09 15:08     ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith
@ 2018-10-10  8:22       ` Tim Smith
  2018-10-11  8:14         ` Mark Syms
  0 siblings, 1 reply; 25+ messages in thread
From: Tim Smith @ 2018-10-10  8:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote:
> On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote:
> > > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > > There must be another reason for the missed wake-up. I'll have to
> > > > study the code some more.
> > > 
> > > I think it needs to go into gfs2_glock_dealloc(), in such a way as to
> > > avoid
> > > that problem. Currently working out a patch to do that.
> > 
> > That doesn't sound right: find_insert_glock is waiting for the glock
> > to be removed from the rhashtable. In gfs2_glock_free, we remove the
> > glock from the rhashtable and then we do the wake-up. Delaying the
> > wakeup further isn't going to help (but it might hide the problem).
> 
> The only way we can get the problem we're seeing is if we get an effective
> order of
> 
> T0: wake_up_glock()
> T1: prepare_to_wait()
> T1: schedule()
> 
> so clearly there's a way for that to happen. Any other order and either
> schedule() doesn't sleep or it gets woken.
> 
> The only way I can see at the moment to ensure that wake_up_glock() *cannot*
> get called until after prepare_to_wait() is to delay it until the read_side
> critical sections are done, and the first place that's got that property is
> the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu()
> to gfs2_glock_free() and I'm guessing there's a reason it's using
> call_rcu() instead.
> 
> I'll keep thinking about it.

OK, we have a result that this definitely *isn't* the right answer (still 
getting the wakeup happening). This is lends more weight to the idea that 
there are multiple waiters, so we'll try your patch.

-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-10  8:22       ` Tim Smith
@ 2018-10-11  8:14         ` Mark Syms
  2018-10-11 19:28           ` Mark Syms
       [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-11  8:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

And, sadly, the change to wait all also doesn't prevent the schedule_timeout from occurring. Something more subtle going on obviously.

	Mark.

-----Original Message-----
From: Tim Smith <tim.smith@citrix.com> 
Sent: 10 October 2018 09:23
To: cluster-devel@redhat.com
Cc: Andreas Gruenbacher <agruenba@redhat.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Mark Syms <Mark.Syms@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote:
> On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote:
> > > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > > There must be another reason for the missed wake-up. I'll have 
> > > > to study the code some more.
> > > 
> > > I think it needs to go into gfs2_glock_dealloc(), in such a way as 
> > > to avoid that problem. Currently working out a patch to do that.
> > 
> > That doesn't sound right: find_insert_glock is waiting for the glock 
> > to be removed from the rhashtable. In gfs2_glock_free, we remove the 
> > glock from the rhashtable and then we do the wake-up. Delaying the 
> > wakeup further isn't going to help (but it might hide the problem).
> 
> The only way we can get the problem we're seeing is if we get an 
> effective order of
> 
> T0: wake_up_glock()
> T1: prepare_to_wait()
> T1: schedule()
> 
> so clearly there's a way for that to happen. Any other order and 
> either
> schedule() doesn't sleep or it gets woken.
> 
> The only way I can see at the moment to ensure that wake_up_glock() 
> *cannot* get called until after prepare_to_wait() is to delay it until 
> the read_side critical sections are done, and the first place that's 
> got that property is the start of gfs2_glock_dealloc(), unless we want 
> to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's 
> a reason it's using
> call_rcu() instead.
> 
> I'll keep thinking about it.

OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch.

--
Tim Smith <tim.smith@citrix.com>





^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
  2018-10-11  8:14         ` Mark Syms
@ 2018-10-11 19:28           ` Mark Syms
       [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Mark Syms @ 2018-10-11 19:28 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Also, the prepare to wait call does so in non exclusive mode so switching the parameter on the wakeup would have no effect as non exclusive waiters are always woken anyway.

So, there is some other reason why we decide to yield but never get woken.

Mark.
________________________________
From: Mark Syms <Mark.Syms@citrix.com>
Sent: Thursday, 11 October 2018 09:14
To: Tim Smith <tim.smith@citrix.com>,cluster-devel at redhat.com
CC: Andreas Gruenbacher <agruenba@redhat.com>,Ross Lagerwall <ross.lagerwall@citrix.com>
Subject: RE: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock


And, sadly, the change to wait all also doesn't prevent the schedule_timeout from occurring. Something more subtle going on obviously.

        Mark.

-----Original Message-----
From: Tim Smith <tim.smith@citrix.com>
Sent: 10 October 2018 09:23
To: cluster-devel@redhat.com
Cc: Andreas Gruenbacher <agruenba@redhat.com>; Ross Lagerwall <ross.lagerwall@citrix.com>; Mark Syms <Mark.Syms@citrix.com>
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

On Tuesday, 9 October 2018 16:08:10 BST Tim Smith wrote:
> On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> > On Tue, 9 Oct 2018 at 14:46, Tim Smith <tim.smith@citrix.com> wrote:
> > > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > > There must be another reason for the missed wake-up. I'll have
> > > > to study the code some more.
> > >
> > > I think it needs to go into gfs2_glock_dealloc(), in such a way as
> > > to avoid that problem. Currently working out a patch to do that.
> >
> > That doesn't sound right: find_insert_glock is waiting for the glock
> > to be removed from the rhashtable. In gfs2_glock_free, we remove the
> > glock from the rhashtable and then we do the wake-up. Delaying the
> > wakeup further isn't going to help (but it might hide the problem).
>
> The only way we can get the problem we're seeing is if we get an
> effective order of
>
> T0: wake_up_glock()
> T1: prepare_to_wait()
> T1: schedule()
>
> so clearly there's a way for that to happen. Any other order and
> either
> schedule() doesn't sleep or it gets woken.
>
> The only way I can see at the moment to ensure that wake_up_glock()
> *cannot* get called until after prepare_to_wait() is to delay it until
> the read_side critical sections are done, and the first place that's
> got that property is the start of gfs2_glock_dealloc(), unless we want
> to add synchronise_rcu() to gfs2_glock_free() and I'm guessing there's
> a reason it's using
> call_rcu() instead.
>
> I'll keep thinking about it.

OK, we have a result that this definitely *isn't* the right answer (still getting the wakeup happening). This is lends more weight to the idea that there are multiple waiters, so we'll try your patch.

--
Tim Smith <tim.smith@citrix.com>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20181011/ef6af022/attachment.htm>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
       [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2019-03-06 16:14             ` Andreas Gruenbacher
  0 siblings, 0 replies; 25+ messages in thread
From: Andreas Gruenbacher @ 2019-03-06 16:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Mark,

I think I found the problem. Let me post the fix in a separate email.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-03-06 16:14 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
2018-10-08 12:56   ` Steven Whitehouse
2018-10-08 12:59     ` Mark Syms
2018-10-08 13:03       ` Steven Whitehouse
2018-10-08 13:10         ` Tim Smith
2018-10-08 13:13           ` Steven Whitehouse
2018-10-08 13:26             ` Tim Smith
2018-10-09  8:13               ` Mark Syms
2018-10-09  8:41                 ` Steven Whitehouse
2018-10-09  8:50                   ` Mark Syms
2018-10-09 12:34               ` Andreas Gruenbacher
2018-10-09 14:00                 ` Tim Smith
2018-10-09 14:47                   ` Andreas Gruenbacher
2018-10-09 15:34                     ` Tim Smith
2018-10-08 13:25         ` Bob Peterson
2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
2018-10-08 12:57   ` Steven Whitehouse
2018-10-08 20:10   ` Bob Peterson
2018-10-08 20:53     ` Mark Syms
     [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
     [not found]   ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>
2018-10-09 15:08     ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith
2018-10-10  8:22       ` Tim Smith
2018-10-11  8:14         ` Mark Syms
2018-10-11 19:28           ` Mark Syms
     [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
2019-03-06 16:14             ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.