[Cluster-devel] [GFS2 PATCH] GFS2: Rework rgrp glock congestion functions for intra-node (v2)

From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Rework rgrp glock congestion functions for intra-node (v2)
Date: Fri, 3 Nov 2017 11:20:43 +0000	[thread overview]
Message-ID: <a86ba3fd-53b4-9124-6698-56aa72e5c322@redhat.com> (raw)
In-Reply-To: <1431187349.26855260.1509642407386.JavaMail.zimbra@redhat.com>

Hi,

On 02/11/17 17:06, Bob Peterson wrote:
> Addressing Steve's other questions:
>
> ----- Original Message -----
> | > We can predict whether a rgrp glock is the victim of intranode
> | > congestion based on set of rules. Rules Enforced by this patch:
> | >
> | > 1. If the current process has a multi-block reservation in the
> | >     rgrp, it needs to use it regardless of the congestion. The
> | >     congestion algorithm should have prevented the reservation
> | >     in the first place.
> | > 2. If some process has the rgrp gl_lockref spin_lock locked,
> | >     they are preparing to use it for a reservation. So we take
> | >     this as a clear sign of impending contention.
> | There can be all kinds of reasons why this lock has been taken. It
> | should have very low contention on it, so that I don't think this is
> | useful overall.
>
> I'll try some experiments without this check.
>
> | > 3. If the rgrp currently has a glock holder, we know we need to
> | >     wait before we can lock it, regardless of whether the holder
> | >     represents an actual holder or a waiter.
> | That is probably wrong. We ought to be able to to multiple allocations
> | in an rgrp if the node is holding the rgrp lock, without having to
> | constantly queue and requeue locks. In other words, if we hold the rgrp
> | in excl mode, then a useful extension would be to allow multiple users
> | to perform allocations in parallel. So the exclusive rgrp glock becomes
> | locally sharable - that is the real issue here I think.
>
> That sounds like a really good enhancement, but out of the scope of
> this patch.
Yes, thats true. However the patch as it stands does a lot of different 
things and those things need to be separated out and looked at 
individually. Which ones really make the performance difference in this 
case?

Doing so many things in a single patch means that it is very tricky if a 
regression is noticed later, since rolling back is either all or 
nothing. These things definitely need splitting up into smaller changes.

>   
> | > 4. If the rgrp currently has a multi-block reservation, and we
> | >     already know it's not ours, then intranode contention is
> | >     likely.
> | What do you mean by "not ours"? In general each node only sees the
> | reservations which are taken out on the node itself.
>
> What I mean is: is the reservation associated with the current inode?
> If the current inode for which we're allocating blocks has a current
> reservation associated with it (gfs2_rs_active(rs) where rs is ip->i_res)
> then we've obliged to use it. If not, we go out searching for a new
> reservation in the rgrp. If the rgrp has a reservation queued from
> a different inode (so its rgrp->rd_rstree is not empty) then another
> process on this node has queued a reservation to this rgrp for the
> purposes of a different file.
The current allocation algorithm tries to put related files (i.e. those 
with the same parent directory) near each other. That is not an 
unreasonable plan, and if there were multiple inodes with the same 
parent directory being written to at the same time, it is not a good 
plan to separate them too far. This is really just a workaround for the 
bigger issue of not being able to have multiple local processes using 
the same rgrp at the same time. I think that needs looking at again,

Steve.