From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Syms Date: Tue, 9 Oct 2018 08:50:15 +0000 Subject: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock In-Reply-To: References: <1539002191-40831-1-git-send-email-mark.syms@citrix.com> <2889867.YGlmsYasr8@dhcp-3-135.uk.xensource.com> <2d468838-ea79-9c89-ae02-89b18f4bda37@redhat.com> <21999012.kCAuXqbhYe@dhcp-3-135.uk.xensource.com> <8c41a29508574143827309528c392350@AMSPEX02CL02.citrite.net> Message-ID: <46277514ff8c4cb3b46a29c914a4fcc3@AMSPEX02CL02.citrite.net> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit We think we have, just making a build to test. Will follow up later. Mark. -----Original Message----- From: Steven Whitehouse Sent: 09 October 2018 09:41 To: Mark Syms ; Tim Smith Cc: cluster-devel at redhat.com; Ross Lagerwall Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock On 09/10/18 09:13, Mark Syms wrote: > Having swapped the line below around we still see the timeout on > schedule fire, but only once in a fairly mega stress test. This is why > we weren't worried about the timeout being HZ, the situation is hardly > ever hit as having to wait is rare and normally we are woken from > schedule and without a timeout on schedule we never wake up so a rare occurrence of waiting a second really doesn't seem too bad. > > Mark. We should still get to the bottom of why the wake up is missing though, since without that fix we won't know if there is something else wrong somewhere, Steve. > > -----Original Message----- > From: Tim Smith > Sent: 08 October 2018 14:27 > To: Steven Whitehouse > Cc: Mark Syms ; cluster-devel at redhat.com; Ross > Lagerwall > Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in > find insert glock > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote: >> Hi, >> >> On 08/10/18 14:10, Tim Smith wrote: >>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: >>>> On 08/10/18 13:59, Mark Syms wrote: >>>>> That sounds entirely reasonable so long as you are absolutely sure >>>>> that nothing is ever going to mess with that glock, we erred on >>>>> the side of more caution not knowing whether it would be guaranteed safe or not. >>>>> >>>>> Thanks, >>>>> >>>>> Mark >>>> We should have a look at the history to see how that wait got added. >>>> However the "dead" flag here means "don't touch this glock" and is >>>> there so that we can separate the marking dead from the actual >>>> removal from the list (which simplifies the locking during the >>>> scanning procedures) >>> You beat me to it :-) >>> >>> I think there might be a bit of a problem inserting a new entry with >>> the same name before the old entry has been fully destroyed (or at >>> least removed), which would be why the schedule() is there. >> If the old entry is marked dead, all future lookups should ignore it. >> We should only have a single non-dead entry at a time, but that >> doesn't seem like it should need us to wait for it. > On the second call we do have the new glock to insert as arg2, so we could try to swap them cleanly, yeah. > >> If we do discover that the wait is really required, then it sounds >> like as you mentioned above there is a lost wakeup, and that must >> presumably be on a code path that sets the dead flag and then fails >> to send a wake up later on. If we can drop the wait in the first >> place, that seems like a better plan, > Ooooh, I wonder if these two lines: > > wake_up_glock(gl); > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc); > > in gfs2_glock_free() are the wrong way round? > > -- > Tim Smith > >