[Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock
@ 2019-03-06 16:14 Andreas Gruenbacher
  2019-03-07 15:54 ` Edwin Török
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Gruenbacher @ 2019-03-06 16:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Mark Syms has reported seeing tasks that are stuck waiting in
find_insert_glock.  It turns out that struct lm_lockname contains four padding
bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
hashing the glock name.  As a result, we can end up waking up the wrong
waitqueue, and the waiting tasks will be stuck.

Use offsetofend instead of sizeof to get the struct size without the padding.

Reported-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/glock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f66773c71bcde..01611f363cd59 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -107,7 +107,8 @@ static int glock_wake_function(wait_queue_entry_t *wait, unsigned int mode,
 
 static wait_queue_head_t *glock_waitqueue(struct lm_lockname *name)
 {
-	u32 hash = jhash2((u32 *)name, sizeof(*name) / 4, 0);
+	unsigned int size = offsetofend(struct lm_lockname, ln_type);
+	u32 hash = jhash2((u32 *)name, size / 4, 0);
 
 	return glock_wait_table + hash_32(hash, GLOCK_WAIT_TABLE_BITS);
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock
  2019-03-06 16:14 [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock Andreas Gruenbacher
@ 2019-03-07 15:54 ` Edwin Török
  2019-03-07 16:23   ` Andreas Gruenbacher
  0 siblings, 1 reply; 4+ messages in thread
From: Edwin Török @ 2019-03-07 15:54 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 06/03/2019 16:14, Andreas Gruenbacher wrote:
> Mark Syms has reported seeing tasks that are stuck waiting in
> find_insert_glock.  It turns out that struct lm_lockname contains four padding
> bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
> hashing the glock name.  As a result, we can end up waking up the wrong
> waitqueue, and the waiting tasks will be stuck.
> 
> Use offsetofend instead of sizeof to get the struct size without the padding.
> 
> Reported-by: Mark Syms <mark.syms@citrix.com>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Thank you for your patch, initial testing looks promising when replacing the schedule_timeout patch with this one (no deadlocks/stuck tasks found so far).
Will let you know when our testing completes.

> ---
>  fs/gfs2/glock.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index f66773c71bcde..01611f363cd59 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -107,7 +107,8 @@ static int glock_wake_function(wait_queue_entry_t *wait, unsigned int mode,
>  
>  static wait_queue_head_t *glock_waitqueue(struct lm_lockname *name)
>  {
> -	u32 hash = jhash2((u32 *)name, sizeof(*name) / 4, 0);
> +	unsigned int size = offsetofend(struct lm_lockname, ln_type);

This is the same as ht_parms.key_len, which I assume was the intention.
Should we just use ht_parms.key_len for size instead of calculating it again to avoid the 2 values becoming
different in the future again?

> +	u32 hash = jhash2((u32 *)name, size / 4, 0);
>  
>  	return glock_wait_table + hash_32(hash, GLOCK_WAIT_TABLE_BITS);
>  }
> 

Best regards,
--Edwin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock
  2019-03-07 15:54 ` Edwin Török
@ 2019-03-07 16:23   ` Andreas Gruenbacher
  2019-03-11 15:06     ` Edwin Török
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Gruenbacher @ 2019-03-07 16:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Edwin,

On Thu, 7 Mar 2019 at 16:55, Edwin T?r?k <edvin.torok@citrix.com> wrote:
> On 06/03/2019 16:14, Andreas Gruenbacher wrote:
> > Mark Syms has reported seeing tasks that are stuck waiting in
> > find_insert_glock.  It turns out that struct lm_lockname contains four padding
> > bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
> > hashing the glock name.  As a result, we can end up waking up the wrong
> > waitqueue, and the waiting tasks will be stuck.
> >
> > Use offsetofend instead of sizeof to get the struct size without the padding.
> >
> > Reported-by: Mark Syms <mark.syms@citrix.com>
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
>
> Thank you for your patch, initial testing looks promising when replacing the schedule_timeout patch with this one (no deadlocks/stuck tasks found so far).
> Will let you know when our testing completes.

okay great, thanks. Hope we can still get this into the 5.1 kernel.

> > ---
> >  fs/gfs2/glock.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > index f66773c71bcde..01611f363cd59 100644
> > --- a/fs/gfs2/glock.c
> > +++ b/fs/gfs2/glock.c
> > @@ -107,7 +107,8 @@ static int glock_wake_function(wait_queue_entry_t *wait, unsigned int mode,
> >
> >  static wait_queue_head_t *glock_waitqueue(struct lm_lockname *name)
> >  {
> > -     u32 hash = jhash2((u32 *)name, sizeof(*name) / 4, 0);
> > +     unsigned int size = offsetofend(struct lm_lockname, ln_type);
>
> This is the same as ht_parms.key_len, which I assume was the intention.
> Should we just use ht_parms.key_len for size instead of calculating it again to avoid the 2 values becoming
> different in the future again?

We can do that, yes.

> > +     u32 hash = jhash2((u32 *)name, size / 4, 0);
> >
> >       return glock_wait_table + hash_32(hash, GLOCK_WAIT_TABLE_BITS);
> >  }

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock
  2019-03-07 16:23   ` Andreas Gruenbacher
@ 2019-03-11 15:06     ` Edwin Török
  0 siblings, 0 replies; 4+ messages in thread
From: Edwin Török @ 2019-03-11 15:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 07/03/2019 16:23, Andreas Gruenbacher wrote:
> Hi Edwin,
> 
> On Thu, 7 Mar 2019 at 16:55, Edwin T?r?k <edvin.torok@citrix.com> wrote:
>> On 06/03/2019 16:14, Andreas Gruenbacher wrote:
>>> Mark Syms has reported seeing tasks that are stuck waiting in
>>> find_insert_glock.  It turns out that struct lm_lockname contains four padding
>>> bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
>>> hashing the glock name.  As a result, we can end up waking up the wrong
>>> waitqueue, and the waiting tasks will be stuck.
>>>
>>> Use offsetofend instead of sizeof to get the struct size without the padding.
>>>
>>> Reported-by: Mark Syms <mark.syms@citrix.com>
>>> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
>>
>> Thank you for your patch, initial testing looks promising when replacing the schedule_timeout patch with this one (no deadlocks/stuck tasks found so far).
>> Will let you know when our testing completes.
> 
> okay great, thanks. Hope we can still get this into the 5.1 kernel.


Testing of this patch itself looked good (no new issues), however GFS2 still deadlocks on 4.19 even with this patch, see:
https://www.redhat.com/archives/cluster-devel/2019-March/msg00000.html
[the iomap deadlocks seems to be more rare now though]

Thanks,
--Edwin



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-11 15:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06 16:14 [Cluster-devel] [PATCH] gfs2: Fix missed wakeups in find_insert_glock Andreas Gruenbacher
2019-03-07 15:54 ` Edwin Török
2019-03-07 16:23   ` Andreas Gruenbacher
2019-03-11 15:06     ` Edwin Török

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.