From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Mon, 8 Oct 2018 14:13:10 +0100 Subject: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock In-Reply-To: <2889867.YGlmsYasr8@dhcp-3-135.uk.xensource.com> References: <1539002191-40831-1-git-send-email-mark.syms@citrix.com> <35e18368-cb90-421b-3998-949d00535000@redhat.com> <2889867.YGlmsYasr8@dhcp-3-135.uk.xensource.com> Message-ID: <2d468838-ea79-9c89-ae02-89b18f4bda37@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On 08/10/18 14:10, Tim Smith wrote: > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote: >> On 08/10/18 13:59, Mark Syms wrote: >>> That sounds entirely reasonable so long as you are absolutely sure that >>> nothing is ever going to mess with that glock, we erred on the side of >>> more caution not knowing whether it would be guaranteed safe or not. >>> >>> Thanks, >>> >>> Mark >> We should have a look at the history to see how that wait got added. >> However the "dead" flag here means "don't touch this glock" and is there >> so that we can separate the marking dead from the actual removal from >> the list (which simplifies the locking during the scanning procedures) > You beat me to it :-) > > I think there might be a bit of a problem inserting a new entry with the same > name before the old entry has been fully destroyed (or at least removed), > which would be why the schedule() is there. > If the old entry is marked dead, all future lookups should ignore it. We should only have a single non-dead entry at a time, but that doesn't seem like it should need us to wait for it. If we do discover that the wait is really required, then it sounds like as you mentioned above there is a lost wakeup, and that must presumably be on a code path that sets the dead flag and then fails to send a wake up later on. If we can drop the wait in the first place, that seems like a better plan, Steve.