From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Gruenbacher Date: Wed, 22 Sep 2021 15:54:49 +0200 Subject: [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed In-Reply-To: <7c83e1ac-5ec6-b008-51d0-11d978ec642f@redhat.com> References: <20210916191003.105866-1-rpeterso@redhat.com> <20210916191003.105866-7-rpeterso@redhat.com> <7c83e1ac-5ec6-b008-51d0-11d978ec642f@redhat.com> Message-ID: List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Sep 22, 2021 at 2:47 PM Bob Peterson wrote: > On 9/22/21 6:57 AM, Andreas Gruenbacher wrote: > > On Thu, Sep 16, 2021 at 9:11 PM Bob Peterson wrote: > >> Before this patch, when a glock was locked, the very first holder on the > >> queue would unlock the lockref and call the go_lock glops function (if > >> one exists), unless GL_SKIP was specified. When we introduced the new > >> node-scope concept, we allowed multiple holders to lock glocks in EX mode > >> and share the lock, but node-scope introduced a new problem: if the > >> first holder has GL_SKIP and the next one does NOT, since it is not the > >> first holder on the queue, the go_lock op was not called. > > > > We use go_lock to (re)validate inodes (for inode glocks) and to read > > in bitmaps (for resource group glocks). I can see how calling go_lock > > was originally tied to the first lock holder, but GL_SKIP already > > broke the simple model that the first holder will call go_lock. The > > go_lock_needed callback only makes things worse yet again, > > unfortunately. > > In what way does go_lock_needed make things worse? It adds an indirection that papers over the fact that the existing abstraction (first holder calls go_lock) doesn't make sense. > > How about we introduce a new GLF_REVALIDATE flag that indicates that > > go_lock needs to be called? The flag would be set when instantiating a > > new glock and when dequeuing the last holder, and cleared in go_lock > > (and in gfs2_inode_refresh for GL_SKIP holders). I'm not sure if > > That was my original design, and it makes the most sense. I named the > flag GLF_GO_LOCK_SKIPPED, but essentially the same thing. Unfortunately, > I ran into all kinds of problems implementing it. In those patches, > first holders would either call glops->go_lock() or set > GLF_GO_LOCK_SKIPPED. Once the go_lock function was complete, it cleared > GLF_GO_LOCK_SKIPPED, and called wake_up_bit. Secondary holders did > wait_on_bit and waited for the other process's go_lock to complete. Just set the flag when we know the glock needs revalidation. There are two possible points in time for doing that: either when we're locking the first holder, or when the glock is new / the last holder is dequeued. Then, we can handle clearing the flag and races among multiple go_lock instances in the go_lock handlers. > But I had tons of problems getting this to work properly. Processes > would hang and deadlock for seemingly no reason. Finally I got > frustrated and sought other solutions. > > I'm willing to try to resurrect that patch set and try again. Maybe you > can help me figure out what I'm doing wrong and why it's not working. > > Bob Peterson > > > GLF_REVALIDATE can fully replace GIF_INVALID as well, but it looks > > like it at first glance. > > > > Thanks, > > Andreas > Thanks, Andreas