All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Smith <tim.smith@citrix.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock
Date: Tue, 9 Oct 2018 15:00:25 +0100	[thread overview]
Message-ID: <2495633.7Q8Fd300OR@dhcp-3-135.uk.xensource.com> (raw)
In-Reply-To: <CAHc6FU6+t-5WdjcApv7rZmTZ5Zg_hxNgmCtNUC3b_m0HrUT-_A@mail.gmail.com>

On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> On Mon, 8 Oct 2018 at 15:27, Tim Smith <tim.smith@citrix.com> wrote:
> > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > On 08/10/18 14:10, Tim Smith wrote:
> > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > >>> That sounds entirely reasonable so long as you are absolutely sure
> > > >>> that
> > > >>> nothing is ever going to mess with that glock, we erred on the side
> > > >>> of
> > > >>> more caution not knowing whether it would be guaranteed safe or not.
> > > >>> 
> > > >>> Thanks,
> > > >>> 
> > > >>>   Mark
> > > >> 
> > > >> We should have a look at the history to see how that wait got added.
> > > >> However the "dead" flag here means "don't touch this glock" and is
> > > >> there
> > > >> so that we can separate the marking dead from the actual removal from
> > > >> the list (which simplifies the locking during the scanning
> > > >> procedures)
> > > > 
> > > > You beat me to it :-)
> > > > 
> > > > I think there might be a bit of a problem inserting a new entry with
> > > > the
> > > > same name before the old entry has been fully destroyed (or at least
> > > > removed), which would be why the schedule() is there.
> > > 
> > > If the old entry is marked dead, all future lookups should ignore it. We
> > > should only have a single non-dead entry at a time, but that doesn't
> > > seem like it should need us to wait for it.
> > 
> > On the second call we do have the new glock to insert as arg2, so we could
> > try to swap them cleanly, yeah.
> > 
> > > If we do discover that the wait is really required, then it sounds like
> > > as you mentioned above there is a lost wakeup, and that must presumably
> > > be on a code path that sets the dead flag and then fails to send a wake
> > > up later on. If we can drop the wait in the first place, that seems like
> > > a better plan,
> > 
> > Ooooh, I wonder if these two lines:
> >         wake_up_glock(gl);
> >         call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > 
> > in gfs2_glock_free() are the wrong way round?
> 
> No, what's important here is that the wake-up happens after the glock
> being freed has been removed from the rhashtable, and that's the case
> here. wake_up_glock actually accesses the glock, which may no longer
> be around after the call_rcu, so swapping the two lines would
> introduce a use-after-free bug.

Yes, realised that :-/

> There must be another reason for the missed wake-up. I'll have to
> study the code some more.

I think it needs to go into gfs2_glock_dealloc(), in such a way as to avoid 
that problem. Currently working out a patch to do that.

> Thanks,
> Andreas

-- 
Tim Smith <tim.smith@citrix.com>




  reply	other threads:[~2018-10-09 14:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 12:36 [Cluster-devel] [PATCH 0/2] GFS2 locking patches Mark Syms
2018-10-08 12:36 ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Mark Syms
2018-10-08 12:56   ` Steven Whitehouse
2018-10-08 12:59     ` Mark Syms
2018-10-08 13:03       ` Steven Whitehouse
2018-10-08 13:10         ` Tim Smith
2018-10-08 13:13           ` Steven Whitehouse
2018-10-08 13:26             ` Tim Smith
2018-10-09  8:13               ` Mark Syms
2018-10-09  8:41                 ` Steven Whitehouse
2018-10-09  8:50                   ` Mark Syms
2018-10-09 12:34               ` Andreas Gruenbacher
2018-10-09 14:00                 ` Tim Smith [this message]
2018-10-09 14:47                   ` Andreas Gruenbacher
2018-10-09 15:34                     ` Tim Smith
2018-10-08 13:25         ` Bob Peterson
2018-10-08 12:36 ` [Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads Mark Syms
2018-10-08 12:57   ` Steven Whitehouse
2018-10-08 20:10   ` Bob Peterson
2018-10-08 20:53     ` Mark Syms
     [not found] ` <1603192.QCZcUHDx0E@dhcp-3-135.uk.xensource.com>
     [not found]   ` <CAHc6FU7AyDQ=yEj9WtN+aqtf7jfWs0TGh=Og0fQFVHyYMKHacA@mail.gmail.com>
2018-10-09 15:08     ` [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock Tim Smith
2018-10-10  8:22       ` Tim Smith
2018-10-11  8:14         ` Mark Syms
2018-10-11 19:28           ` Mark Syms
     [not found]           ` <5bbfa461.1c69fb81.1242f.59a7SMTPIN_ADDED_BROKEN@mx.google.com>
2019-03-06 16:14             ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2495633.7Q8Fd300OR@dhcp-3-135.uk.xensource.com \
    --to=tim.smith@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.