[Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node
@ 2018-01-10 20:42 Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 06 November 2017, I posted this six-part patch set for rgrp glock congestion
and got no response. I'm resending it here to give people one last
opportunity to review it before I push it to the for-next branch. It has
been rebased, but otherwise unchanged.
---
Original patch set description:

On 02 November, I posted a patch for this, but Steve Whitehouse pointed out
a check that didn't look right. I tested it and found it to be unnecessary,
so I remove it.

Steve also pointed out that the patch tried to do too much. He asked which
part of the patch actually made the performance gains.

Based on Steve's suggestions, I've broken the changes into six patches for this
patchset. The first five are all refactoring to make the code simpler and
more easy to follow. Much of the complex special-case checks were consolidated
from the variety of locations to more centralized functions. The actual
performance gains are accomplished by the very last patch, "GFS2: Add checks
for intra-node congestion". So strictly speaking, the last patch is the only
patch we really need. But without the others, today's kludgy code becomes
even more kludgy and harder to read.

In doing all this, I did find a minor problem with the previous version's
logic, and fixed it.

I retested iozone performance and fragmentation. It is pretty much the same
as I documented for the previous patch:

Bob Peterson (6):
  GFS2: Simplify gfs2_rgrp_used_recently
  GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  GFS2: Refactor function fast_to_acquire
  GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
  GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases
  GFS2: Add checks for intra-node congestion

 fs/gfs2/rgrp.c | 176 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 134 insertions(+), 42 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch simplifies function gfs2_rgrp_used_recently. Now it uses
ktime functions to do its calculations. Also, it's called with the
rgrp rather than the reservation, and the constant HZ rather than
the hard-coded value 1000.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index e8aba6fa1472..ea95d7c70e23 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1893,21 +1893,18 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 }
 
 /**
- * gfs2_rgrp_used_recently
- * @rs: The block reservation with the rgrp to test
+ * gfs2_rgrp_used_recently - check if a rgrp has been used recently
+ * @rgd: The rgrp to test
  * @msecs: The time limit in milliseconds
  *
  * Returns: True if the rgrp glock has been used within the time limit
  */
-static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
-				    u64 msecs)
+static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
+					   u64 msecs)
 {
-	u64 tdiff;
-
-	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
-                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
-
-	return tdiff > (msecs * 1000 * 1000);
+	return (ktime_before(ktime_get_real(),
+			     ktime_add(rgd->rd_gl->gl_dstamp,
+				       ms_to_ktime(msecs))));
 }
 
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
@@ -2007,7 +2004,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 				    !fast_to_acquire(rs->rs_rbm.rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
-				    gfs2_rgrp_used_recently(rs, 1000) &&
+				    gfs2_rgrp_used_recently(rs->rs_rbm.rgd, HZ) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
 			}
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-12 10:45   ` Steven Whitehouse
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function gfs2_rgrp_congested is called in two places: once before
we acquire the rgrp glock, and once after. The first time, it checks
to see if the rgrp was used recently because the usage statistics
are only valid in short-term. For the second call, that check was
not done, but should have been, for the same reason.

This patch simply moves function gfs2_rgrp_used_recently before
function gfs2_rgrp_congested, and it's now called from there so
it applies equally to both checks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ea95d7c70e23..960fbc2d39d7 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1819,6 +1819,21 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 	return;
 }
 
+/**
+ * gfs2_rgrp_used_recently - check if a rgrp has been used recently
+ * @rgd: The rgrp to test
+ * @msecs: The time limit in milliseconds
+ *
+ * Returns: True if the rgrp glock has been used within the time limit
+ */
+static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
+					   u64 msecs)
+{
+	return (ktime_before(ktime_get_real(),
+			     ktime_add(rgd->rd_gl->gl_dstamp,
+				       ms_to_ktime(msecs))));
+}
+
 /**
  * gfs2_rgrp_congested - Use stats to figure out whether an rgrp is congested
  * @rgd: The rgrp in question
@@ -1858,6 +1873,11 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
+	/* If it hasn't been used recently we can't judge the statistics, so
+	   assume it's not congested. */
+	if (!gfs2_rgrp_used_recently(rgd, HZ))
+		return false;
+
 	preempt_disable();
 	for_each_present_cpu(cpu) {
 		st = &per_cpu_ptr(sdp->sd_lkstats, cpu)->lkstats[LM_TYPE_RGRP];
@@ -1892,21 +1912,6 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	return ((srttb_diff < 0) && (sqr_diff > var));
 }
 
-/**
- * gfs2_rgrp_used_recently - check if a rgrp has been used recently
- * @rgd: The rgrp to test
- * @msecs: The time limit in milliseconds
- *
- * Returns: True if the rgrp glock has been used within the time limit
- */
-static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
-					   u64 msecs)
-{
-	return (ktime_before(ktime_get_real(),
-			     ktime_add(rgd->rd_gl->gl_dstamp,
-				       ms_to_ktime(msecs))));
-}
-
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
 {
 	const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
@@ -2004,7 +2009,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 				    !fast_to_acquire(rs->rs_rbm.rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
-				    gfs2_rgrp_used_recently(rs->rs_rbm.rgd, HZ) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
 			}
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-12 10:51   ` Steven Whitehouse
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function fast_to_acquire determines if a rgrp ought to be fast to
acquire, but the logic was cryptic. This patch makes the logic more
straightforward:

1. If the rgrp is one of our "preferred" rgrps, consider it fast
   to acquire. This is how each node tries to confine its allocations
   to a set of rgrps unique from the other nodes, to reduce contention.
2. If the rgrp glock is unlocked, consider it slow, because it will
   take some time to lock it, either for the DLM or for the glock
   state machine.
3. If there are glock holders, consider it slow because we'd have to
   wait for another process to dequeue before we get our turn.
4. If the glock is being demoted, consider it slow.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 960fbc2d39d7..fa72e3e6ce4e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1938,20 +1938,32 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
 /**
  * fast_to_acquire - determine if a resource group will be fast to acquire
  *
- * If this is one of our preferred rgrps, it should be quicker to acquire,
- * because we tried to set ourselves up as dlm lock master.
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
  */
-static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
+static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
 {
 	struct gfs2_glock *gl = rgd->rd_gl;
 
-	if (gl->gl_state != LM_ST_UNLOCKED && list_empty(&gl->gl_holders) &&
-	    !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
-	    !test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return 1;
 	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return 1;
-	return 0;
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
 }
 
 /**
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (2 preceding siblings ...)
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-12 10:55   ` Steven Whitehouse
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch simply moves function fast_to_acquire to before function
gfs2_rgrp_congested, and checks it inside the function rather than
before calling it. As with function gfs2_rgrp_used_recently, function
fast_to_acquire was only called the first time gfs2_rgrp_congested
was used, but not the second, and it applies equally to both calls.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 68 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index fa72e3e6ce4e..9d5f35b01c1d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1819,6 +1819,37 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 	return;
 }
 
+/**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
+ */
+static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
+}
+
 /**
  * gfs2_rgrp_used_recently - check if a rgrp has been used recently
  * @rgd: The rgrp to test
@@ -1873,6 +1904,9 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
+	if (loops == 0 && !fast_to_acquire(rgd))
+		return true;
+
 	/* If it hasn't been used recently we can't judge the statistics, so
 	   assume it's not congested. */
 	if (!gfs2_rgrp_used_recently(rgd, HZ))
@@ -1935,37 +1969,6 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
 	return false;
 }
 
-/**
- * fast_to_acquire - determine if a resource group will be fast to acquire
- *
- * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
- *    because we tried to set ourselves up as dlm lock master.
- * 2. If the glock is unlocked, consider it slow because it will take time for
- *    the dlm and the glock state machine to transition it to a locked state.
- * 3. If there are glock holder records queued to the glock, consider it slow
- *    because this process will need to be queued up behind another process:
- *    Our request can't be enqueued until the other is dequeued.
- * 4. If the rgrp glock is being demoted, consider it slow because the glock
- *    will need to be demoted, possibly used on another node, the promoted,
- *    all of which will take time.
- *
- */
-static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
-{
-	struct gfs2_glock *gl = rgd->rd_gl;
-
-	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return true;
-	if (gl->gl_state == LM_ST_UNLOCKED)
-		return false;
-	if (!list_empty(&gl->gl_holders))
-		return false;
-	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
-	    test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return false;
-	return true;
-}
-
 /**
  * gfs2_inplace_reserve - Reserve space in the filesystem
  * @ip: the inode to reserve space for
@@ -2017,9 +2020,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
-				if (loops == 0 &&
-				    !fast_to_acquire(rs->rs_rbm.rgd))
-					goto next_rgrp;
 				if ((loops < 2) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (3 preceding siblings ...)
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
  2018-01-11 15:52 ` [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Abhijith Das
  6 siblings, 0 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_inplace_reserve did a bunch of
pre-checks based on loop counters and whether or not we have a
pre-existing reservation we must use. Then it called function
gfs2_rgrp_congested. But it only makes sense to check the rgrp
glock statistics if we're using a real locking protocol like dlm.
For lock_nolock, this is just a waste of time and can give false
positives.

This patch breaks function gfs2_rgrp_congested into its two cases:
inter-node and intra-node congestion. For intra-node congestion,
the logic is just simplified from what gfs2_inplace_reserve had
already done. The checks for inter-node congestion is moved to
a separate gfs2_rgrp_congested_dlm function.

So this basically stubs in a framework for doing more in-depth
checks for intra-node congestion, in a later patch.

Function fast_to_acquire is also moved to be closer to function
gfs2_rgrp_congested.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 128 ++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 85 insertions(+), 43 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 9d5f35b01c1d..641bb4a8cf5b 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1819,37 +1819,6 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 	return;
 }
 
-/**
- * fast_to_acquire - determine if a resource group will be fast to acquire
- *
- * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
- *    because we tried to set ourselves up as dlm lock master.
- * 2. If the glock is unlocked, consider it slow because it will take time for
- *    the dlm and the glock state machine to transition it to a locked state.
- * 3. If there are glock holder records queued to the glock, consider it slow
- *    because this process will need to be queued up behind another process:
- *    Our request can't be enqueued until the other is dequeued.
- * 4. If the rgrp glock is being demoted, consider it slow because the glock
- *    will need to be demoted, possibly used on another node, the promoted,
- *    all of which will take time.
- *
- */
-static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
-{
-	struct gfs2_glock *gl = rgd->rd_gl;
-
-	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return true;
-	if (gl->gl_state == LM_ST_UNLOCKED)
-		return false;
-	if (!list_empty(&gl->gl_holders))
-		return false;
-	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
-	    test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return false;
-	return true;
-}
-
 /**
  * gfs2_rgrp_used_recently - check if a rgrp has been used recently
  * @rgd: The rgrp to test
@@ -1866,7 +1835,7 @@ static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
 }
 
 /**
- * gfs2_rgrp_congested - Use stats to figure out whether an rgrp is congested
+ * gfs2_rgrp_congested_dlm - Use stats to figure out if an rgrp is congested
  * @rgd: The rgrp in question
  * @loops: An indication of how picky we can be (0=very, 1=less so)
  *
@@ -1892,7 +1861,7 @@ static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
  * Returns: A boolean verdict on the congestion status
  */
 
-static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
+static bool gfs2_rgrp_congested_dlm(const struct gfs2_rgrpd *rgd, int loops)
 {
 	const struct gfs2_glock *gl = rgd->rd_gl;
 	const struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
@@ -1904,9 +1873,6 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
-	if (loops == 0 && !fast_to_acquire(rgd))
-		return true;
-
 	/* If it hasn't been used recently we can't judge the statistics, so
 	   assume it's not congested. */
 	if (!gfs2_rgrp_used_recently(rgd, HZ))
@@ -1946,6 +1912,86 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	return ((srttb_diff < 0) && (sqr_diff > var));
 }
 
+/**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
+ */
+static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
+}
+
+/**
+ * gfs2_rgrp_congested - decide whether a rgrp glock is congested
+ * @rs: The reservation in question
+ * @loops: An indication of how picky we can be (0=very, 1=less so)
+ *
+ * There are two kinds of congestion: inter-node and intra-node.
+ *
+ * Inter-node congestion is where multiple nodes all want to allocate blocks
+ * inside the same rgrp, which means they need to trade the rgrp glock back
+ * and forth, which is a slow process. To mitigate this, we use glock
+ * statistics to predict whether the glock is historically fast to acquire.
+ *
+ * Intra-node congestion is where you have multiple processes on the same
+ * node, all trying to allocate blocks in the same rgrp. There's nothing wrong
+ * with doing so, but each process needs to wait for the other to release the
+ * rgrp glock before it may proceed.
+ *
+ * If we're not using inter-node locking (dlm) it doesn't make sense to check
+ * the glock statistics. Instead, we do some simple checks based on how
+ * desperate we are to get blocks (the number of loops).
+ *
+ * We know the number of loops we've been around, so we know how desperate we
+ * are to find something. On first loop, call it congested if anyone else has
+ * a block reservation. On second loop, call it congested if it's not fast to
+ * acquire.
+ */
+static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
+{
+	const struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
+
+	/* We already have a reservation, we need to use it regardless */
+	if (gfs2_rs_active(rs))
+		return false;
+
+	/* If we've rejected all the rgrps a few times, can no longer worry
+	   about whether the rgrp is congested. Fill in blocks where we can. */
+	if (loops >= 2)
+		return false;
+
+	if (loops == 0 && !fast_to_acquire(rgd))
+		return true;
+
+	/* Check for inter-node congestion */
+	if (rgd->rd_sbd->sd_lockstruct.ls_ops->lm_lock) /* lock_dlm */
+		return gfs2_rgrp_congested_dlm(rgd, loops);
+
+	return false;
+}
+
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
 {
 	const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
@@ -2019,18 +2065,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
-			if (!gfs2_rs_active(rs)) {
-				if ((loops < 2) &&
-				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
-					goto next_rgrp;
-			}
+			if (gfs2_rgrp_congested(rs, loops))
+				goto next_rgrp;
 			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &rs->rs_rgd_gh);
 			if (unlikely(error))
 				return error;
-			if (!gfs2_rs_active(rs) && (loops < 2) &&
-			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+			if (gfs2_rgrp_congested(rs, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rbm.rgd);
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (4 preceding siblings ...)
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
@ 2018-01-10 20:42 ` Bob Peterson
  2018-01-12 11:11   ` Steven Whitehouse
  2018-01-11 15:52 ` [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Abhijith Das
  6 siblings, 1 reply; 15+ messages in thread
From: Bob Peterson @ 2018-01-10 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds logic to function gfs2_rgrp_congested to check for
intra-node congestion by calling new function other_rgrp_users
which checks whether there are other glock holders (done only before
we lock it ourselves) and whether there are other block reservations.
If there is a block reservation, we know it is for a different
inode: if it was for ours, we would have been forced to use it and
we would not be searching for a new one.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 641bb4a8cf5b..6db1bf4816ff 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1943,10 +1943,37 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
 	return true;
 }
 
+/**
+ * other_rgrp_users - figure out if this rgrp has other users
+ * @rgd: The resource group
+ * @locked: true if we've already held the glock
+ *
+ * We're trying to figure out if the given rgrp has anybody competing for
+ * its free space. If other processes have enqueued its glock, there's a
+ * good chance of competition.
+ *
+ * If there are multi-block reservations for this rgrp, there's a good
+ * chance another process will lock the rgrp for block allocations soon.
+ *
+ * If we've already held the glock, we no longer care if there are holders
+ * because that's now a given (rgrp glocks are never shared).
+ */
+static inline bool other_rgrp_users(const struct gfs2_rgrpd *rgd, bool locked)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (!locked && !list_empty(&gl->gl_holders))
+		return true;
+	if (!RB_EMPTY_ROOT(&rgd->rd_rstree))
+		return true;
+	return false;
+}
+
 /**
  * gfs2_rgrp_congested - decide whether a rgrp glock is congested
  * @rs: The reservation in question
  * @loops: An indication of how picky we can be (0=very, 1=less so)
+ * @locked: Indicates if checks are before or after we've enqueued the glock.
  *
  * There are two kinds of congestion: inter-node and intra-node.
  *
@@ -1960,6 +1987,11 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
  * with doing so, but each process needs to wait for the other to release the
  * rgrp glock before it may proceed.
  *
+ * Both kinds of congestion can hurt performance, but it's faster to check
+ * intra-node, so we do that first. After all, why bother to check if we can
+ * get the glock quickly from DLM if other processes have also used that
+ * same reasoning.
+ *
  * If we're not using inter-node locking (dlm) it doesn't make sense to check
  * the glock statistics. Instead, we do some simple checks based on how
  * desperate we are to get blocks (the number of loops).
@@ -1969,7 +2001,8 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
  * a block reservation. On second loop, call it congested if it's not fast to
  * acquire.
  */
-static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
+static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops,
+				bool locked)
 {
 	const struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
 
@@ -1982,6 +2015,10 @@ static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
 	if (loops >= 2)
 		return false;
 
+	/* Check for intra-node congestion */
+	if (loops == 0 && other_rgrp_users(rgd, locked))
+		return true;
+
 	if (loops == 0 && !fast_to_acquire(rgd))
 		return true;
 
@@ -2065,14 +2102,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
-			if (gfs2_rgrp_congested(rs, loops))
+			if (gfs2_rgrp_congested(rs, loops, false))
 				goto next_rgrp;
 			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &rs->rs_rgd_gh);
 			if (unlikely(error))
 				return error;
-			if (gfs2_rgrp_congested(rs, loops))
+			if (gfs2_rgrp_congested(rs, loops, true))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rbm.rgd);
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node
  2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (5 preceding siblings ...)
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
@ 2018-01-11 15:52 ` Abhijith Das
  6 siblings, 0 replies; 15+ messages in thread
From: Abhijith Das @ 2018-01-11 15:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Looks good to me. ACK.

Cheers!
--Abhi

----- Original Message -----
> From: "Bob Peterson" <rpeterso@redhat.com>
> To: "cluster-devel" <cluster-devel@redhat.com>
> Sent: Wednesday, January 10, 2018 2:42:21 PM
> Subject: [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock	congestion functions for intra-node
> 
> On 06 November 2017, I posted this six-part patch set for rgrp glock
> congestion
> and got no response. I'm resending it here to give people one last
> opportunity to review it before I push it to the for-next branch. It has
> been rebased, but otherwise unchanged.
> ---
> Original patch set description:
> 
> On 02 November, I posted a patch for this, but Steve Whitehouse pointed out
> a check that didn't look right. I tested it and found it to be unnecessary,
> so I remove it.
> 
> Steve also pointed out that the patch tried to do too much. He asked which
> part of the patch actually made the performance gains.
> 
> Based on Steve's suggestions, I've broken the changes into six patches for
> this
> patchset. The first five are all refactoring to make the code simpler and
> more easy to follow. Much of the complex special-case checks were
> consolidated
> from the variety of locations to more centralized functions. The actual
> performance gains are accomplished by the very last patch, "GFS2: Add checks
> for intra-node congestion". So strictly speaking, the last patch is the only
> patch we really need. But without the others, today's kludgy code becomes
> even more kludgy and harder to read.
> 
> In doing all this, I did find a minor problem with the previous version's
> logic, and fixed it.
> 
> I retested iozone performance and fragmentation. It is pretty much the same
> as I documented for the previous patch:
> 
> Bob Peterson (6):
>   GFS2: Simplify gfs2_rgrp_used_recently
>   GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
>   GFS2: Refactor function fast_to_acquire
>   GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
>   GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases
>   GFS2: Add checks for intra-node congestion
> 
>  fs/gfs2/rgrp.c | 176
>  +++++++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 134 insertions(+), 42 deletions(-)
> 
> --
> 2.14.3
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
@ 2018-01-12 10:45   ` Steven Whitehouse
  2018-01-15 15:48     ` Bob Peterson
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Whitehouse @ 2018-01-12 10:45 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 10/01/18 20:42, Bob Peterson wrote:
> Function gfs2_rgrp_congested is called in two places: once before
> we acquire the rgrp glock, and once after. The first time, it checks
> to see if the rgrp was used recently because the usage statistics
> are only valid in short-term. For the second call, that check was
> not done, but should have been, for the same reason.
>
> This patch simply moves function gfs2_rgrp_used_recently before
> function gfs2_rgrp_congested, and it's now called from there so
> it applies equally to both checks.
I don't think this makes sense. The purpose of the 
gfs2_rgrp_used_recently test is that when we don't hold the glock we 
might not have up to date stats to make decisions from, so we check that 
we've held the lock recently enough that we can use historical 
statistics. Once we hold the lock however, then we can be sure that the 
stats are up to date, as the process of acquiring the lock will bring 
them up to date, so why repeat the test when we already have the lock?

Steve.

> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/rgrp.c | 36 ++++++++++++++++++++----------------
>   1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index ea95d7c70e23..960fbc2d39d7 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1819,6 +1819,21 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
>   	return;
>   }
>   
> +/**
> + * gfs2_rgrp_used_recently - check if a rgrp has been used recently
> + * @rgd: The rgrp to test
> + * @msecs: The time limit in milliseconds
> + *
> + * Returns: True if the rgrp glock has been used within the time limit
> + */
> +static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
> +					   u64 msecs)
> +{
> +	return (ktime_before(ktime_get_real(),
> +			     ktime_add(rgd->rd_gl->gl_dstamp,
> +				       ms_to_ktime(msecs))));
> +}
> +
>   /**
>    * gfs2_rgrp_congested - Use stats to figure out whether an rgrp is congested
>    * @rgd: The rgrp in question
> @@ -1858,6 +1873,11 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
>   	u64 var;
>   	int cpu, nonzero = 0;
>   
> +	/* If it hasn't been used recently we can't judge the statistics, so
> +	   assume it's not congested. */
> +	if (!gfs2_rgrp_used_recently(rgd, HZ))
> +		return false;
> +
>   	preempt_disable();
>   	for_each_present_cpu(cpu) {
>   		st = &per_cpu_ptr(sdp->sd_lkstats, cpu)->lkstats[LM_TYPE_RGRP];
> @@ -1892,21 +1912,6 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
>   	return ((srttb_diff < 0) && (sqr_diff > var));
>   }
>   
> -/**
> - * gfs2_rgrp_used_recently - check if a rgrp has been used recently
> - * @rgd: The rgrp to test
> - * @msecs: The time limit in milliseconds
> - *
> - * Returns: True if the rgrp glock has been used within the time limit
> - */
> -static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
> -					   u64 msecs)
> -{
> -	return (ktime_before(ktime_get_real(),
> -			     ktime_add(rgd->rd_gl->gl_dstamp,
> -				       ms_to_ktime(msecs))));
> -}
> -
>   static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
>   {
>   	const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> @@ -2004,7 +2009,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   				    !fast_to_acquire(rs->rs_rbm.rgd))
>   					goto next_rgrp;
>   				if ((loops < 2) &&
> -				    gfs2_rgrp_used_recently(rs->rs_rbm.rgd, HZ) &&
>   				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
>   					goto next_rgrp;
>   			}



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
@ 2018-01-12 10:51   ` Steven Whitehouse
  2018-01-15 16:00     ` Bob Peterson
  2018-01-15 16:24     ` Bob Peterson
  0 siblings, 2 replies; 15+ messages in thread
From: Steven Whitehouse @ 2018-01-12 10:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 10/01/18 20:42, Bob Peterson wrote:
> Function fast_to_acquire determines if a rgrp ought to be fast to
> acquire, but the logic was cryptic. This patch makes the logic more
> straightforward:
>
> 1. If the rgrp is one of our "preferred" rgrps, consider it fast
>     to acquire. This is how each node tries to confine its allocations
>     to a set of rgrps unique from the other nodes, to reduce contention.
> 2. If the rgrp glock is unlocked, consider it slow, because it will
>     take some time to lock it, either for the DLM or for the glock
>     state machine.
> 3. If there are glock holders, consider it slow because we'd have to
>     wait for another process to dequeue before we get our turn.
> 4. If the glock is being demoted, consider it slow.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
This appears to mean that for preferred rgrps we will no longer take 
into account any of the other conditions. I'm not sure that is right, 
since we want to avoid rgrps that are in use by other nodes, or are 
being demoted, even if they are preferred. The location of the DLM lock 
master is not really that critical, the slow thing is the I/O that is 
done under the glocks,

Steve.

> ---
>   fs/gfs2/rgrp.c | 30 +++++++++++++++++++++---------
>   1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 960fbc2d39d7..fa72e3e6ce4e 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1938,20 +1938,32 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
>   /**
>    * fast_to_acquire - determine if a resource group will be fast to acquire
>    *
> - * If this is one of our preferred rgrps, it should be quicker to acquire,
> - * because we tried to set ourselves up as dlm lock master.
> + * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
> + *    because we tried to set ourselves up as dlm lock master.
> + * 2. If the glock is unlocked, consider it slow because it will take time for
> + *    the dlm and the glock state machine to transition it to a locked state.
> + * 3. If there are glock holder records queued to the glock, consider it slow
> + *    because this process will need to be queued up behind another process:
> + *    Our request can't be enqueued until the other is dequeued.
> + * 4. If the rgrp glock is being demoted, consider it slow because the glock
> + *    will need to be demoted, possibly used on another node, the promoted,
> + *    all of which will take time.
> + *
>    */
> -static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
> +static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
>   {
>   	struct gfs2_glock *gl = rgd->rd_gl;
>   
> -	if (gl->gl_state != LM_ST_UNLOCKED && list_empty(&gl->gl_holders) &&
> -	    !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
> -	    !test_bit(GLF_DEMOTE, &gl->gl_flags))
> -		return 1;
>   	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
> -		return 1;
> -	return 0;
> +		return true;
> +	if (gl->gl_state == LM_ST_UNLOCKED)
> +		return false;
> +	if (!list_empty(&gl->gl_holders))
> +		return false;
> +	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
> +	    test_bit(GLF_DEMOTE, &gl->gl_flags))
> +		return false;
> +	return true;
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
@ 2018-01-12 10:55   ` Steven Whitehouse
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Whitehouse @ 2018-01-12 10:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 10/01/18 20:42, Bob Peterson wrote:
> This patch simply moves function fast_to_acquire to before function
> gfs2_rgrp_congested, and checks it inside the function rather than
> before calling it. As with function gfs2_rgrp_used_recently, function
> fast_to_acquire was only called the first time gfs2_rgrp_congested
> was used, but not the second, and it applies equally to both calls.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
This definitely doesn't make any sense to move. The point here is that 
we want to know if the rgrp glock is likely to be fast to acquire. Once 
we have the glock we don't care any more.... so why check it again after 
the fact? It is just too late by then,

Steve.

> ---
>   fs/gfs2/rgrp.c | 68 +++++++++++++++++++++++++++++-----------------------------
>   1 file changed, 34 insertions(+), 34 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index fa72e3e6ce4e..9d5f35b01c1d 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1819,6 +1819,37 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
>   	return;
>   }
>   
> +/**
> + * fast_to_acquire - determine if a resource group will be fast to acquire
> + *
> + * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
> + *    because we tried to set ourselves up as dlm lock master.
> + * 2. If the glock is unlocked, consider it slow because it will take time for
> + *    the dlm and the glock state machine to transition it to a locked state.
> + * 3. If there are glock holder records queued to the glock, consider it slow
> + *    because this process will need to be queued up behind another process:
> + *    Our request can't be enqueued until the other is dequeued.
> + * 4. If the rgrp glock is being demoted, consider it slow because the glock
> + *    will need to be demoted, possibly used on another node, the promoted,
> + *    all of which will take time.
> + *
> + */
> +static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
> +{
> +	struct gfs2_glock *gl = rgd->rd_gl;
> +
> +	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
> +		return true;
> +	if (gl->gl_state == LM_ST_UNLOCKED)
> +		return false;
> +	if (!list_empty(&gl->gl_holders))
> +		return false;
> +	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
> +	    test_bit(GLF_DEMOTE, &gl->gl_flags))
> +		return false;
> +	return true;
> +}
> +
>   /**
>    * gfs2_rgrp_used_recently - check if a rgrp has been used recently
>    * @rgd: The rgrp to test
> @@ -1873,6 +1904,9 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
>   	u64 var;
>   	int cpu, nonzero = 0;
>   
> +	if (loops == 0 && !fast_to_acquire(rgd))
> +		return true;
> +
>   	/* If it hasn't been used recently we can't judge the statistics, so
>   	   assume it's not congested. */
>   	if (!gfs2_rgrp_used_recently(rgd, HZ))
> @@ -1935,37 +1969,6 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
>   	return false;
>   }
>   
> -/**
> - * fast_to_acquire - determine if a resource group will be fast to acquire
> - *
> - * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
> - *    because we tried to set ourselves up as dlm lock master.
> - * 2. If the glock is unlocked, consider it slow because it will take time for
> - *    the dlm and the glock state machine to transition it to a locked state.
> - * 3. If there are glock holder records queued to the glock, consider it slow
> - *    because this process will need to be queued up behind another process:
> - *    Our request can't be enqueued until the other is dequeued.
> - * 4. If the rgrp glock is being demoted, consider it slow because the glock
> - *    will need to be demoted, possibly used on another node, the promoted,
> - *    all of which will take time.
> - *
> - */
> -static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
> -{
> -	struct gfs2_glock *gl = rgd->rd_gl;
> -
> -	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
> -		return true;
> -	if (gl->gl_state == LM_ST_UNLOCKED)
> -		return false;
> -	if (!list_empty(&gl->gl_holders))
> -		return false;
> -	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
> -	    test_bit(GLF_DEMOTE, &gl->gl_flags))
> -		return false;
> -	return true;
> -}
> -
>   /**
>    * gfs2_inplace_reserve - Reserve space in the filesystem
>    * @ip: the inode to reserve space for
> @@ -2017,9 +2020,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			if (skip && skip--)
>   				goto next_rgrp;
>   			if (!gfs2_rs_active(rs)) {
> -				if (loops == 0 &&
> -				    !fast_to_acquire(rs->rs_rbm.rgd))
> -					goto next_rgrp;
>   				if ((loops < 2) &&
>   				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
>   					goto next_rgrp;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion
  2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
@ 2018-01-12 11:11   ` Steven Whitehouse
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Whitehouse @ 2018-01-12 11:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 10/01/18 20:42, Bob Peterson wrote:
> This patch adds logic to function gfs2_rgrp_congested to check for
> intra-node congestion by calling new function other_rgrp_users
> which checks whether there are other glock holders (done only before
> we lock it ourselves) and whether there are other block reservations.
> If there is a block reservation, we know it is for a different
> inode: if it was for ours, we would have been forced to use it and
> we would not be searching for a new one.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
This is the wrong solution I think. We should not be choosing other 
rgrps for allocation due to local contention. That is just likely to 
land up creating more fragmentation by spreading things out across the 
disk. The issue is that we can't currently separate the locking into dlm 
(exclusive) and local (shared) but important data structures spinlock 
protected, so that we land up with lock contention on the rgrp and 
everything serialized via the glock. At the moment the glock state 
machine only supports everything (dlm & local) shared or everything 
exclusive.

If we want to resolve this issue, then we need to deal with that issue 
first and allow multiple local users of the same rgrp to co-exist with 
minimal lock contention. That requires the glock state machine to 
support that particular lock mode first though,

Steve.

> ---
>   fs/gfs2/rgrp.c | 43 ++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 641bb4a8cf5b..6db1bf4816ff 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1943,10 +1943,37 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
>   	return true;
>   }
>   
> +/**
> + * other_rgrp_users - figure out if this rgrp has other users
> + * @rgd: The resource group
> + * @locked: true if we've already held the glock
> + *
> + * We're trying to figure out if the given rgrp has anybody competing for
> + * its free space. If other processes have enqueued its glock, there's a
> + * good chance of competition.
> + *
> + * If there are multi-block reservations for this rgrp, there's a good
> + * chance another process will lock the rgrp for block allocations soon.
> + *
> + * If we've already held the glock, we no longer care if there are holders
> + * because that's now a given (rgrp glocks are never shared).
> + */
> +static inline bool other_rgrp_users(const struct gfs2_rgrpd *rgd, bool locked)
> +{
> +	struct gfs2_glock *gl = rgd->rd_gl;
> +
> +	if (!locked && !list_empty(&gl->gl_holders))
> +		return true;
> +	if (!RB_EMPTY_ROOT(&rgd->rd_rstree))
> +		return true;
> +	return false;
> +}
> +
>   /**
>    * gfs2_rgrp_congested - decide whether a rgrp glock is congested
>    * @rs: The reservation in question
>    * @loops: An indication of how picky we can be (0=very, 1=less so)
> + * @locked: Indicates if checks are before or after we've enqueued the glock.
>    *
>    * There are two kinds of congestion: inter-node and intra-node.
>    *
> @@ -1960,6 +1987,11 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
>    * with doing so, but each process needs to wait for the other to release the
>    * rgrp glock before it may proceed.
>    *
> + * Both kinds of congestion can hurt performance, but it's faster to check
> + * intra-node, so we do that first. After all, why bother to check if we can
> + * get the glock quickly from DLM if other processes have also used that
> + * same reasoning.
> + *
>    * If we're not using inter-node locking (dlm) it doesn't make sense to check
>    * the glock statistics. Instead, we do some simple checks based on how
>    * desperate we are to get blocks (the number of loops).
> @@ -1969,7 +2001,8 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
>    * a block reservation. On second loop, call it congested if it's not fast to
>    * acquire.
>    */
> -static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
> +static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops,
> +				bool locked)
>   {
>   	const struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
>   
> @@ -1982,6 +2015,10 @@ static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
>   	if (loops >= 2)
>   		return false;
>   
> +	/* Check for intra-node congestion */
> +	if (loops == 0 && other_rgrp_users(rgd, locked))
> +		return true;
> +
>   	if (loops == 0 && !fast_to_acquire(rgd))
>   		return true;
>   
> @@ -2065,14 +2102,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			rg_locked = 0;
>   			if (skip && skip--)
>   				goto next_rgrp;
> -			if (gfs2_rgrp_congested(rs, loops))
> +			if (gfs2_rgrp_congested(rs, loops, false))
>   				goto next_rgrp;
>   			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
>   						   LM_ST_EXCLUSIVE, flags,
>   						   &rs->rs_rgd_gh);
>   			if (unlikely(error))
>   				return error;
> -			if (gfs2_rgrp_congested(rs, loops))
> +			if (gfs2_rgrp_congested(rs, loops, true))
>   				goto skip_rgrp;
>   			if (sdp->sd_args.ar_rgrplvb) {
>   				error = update_rgrp_lvb(rs->rs_rbm.rgd);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  2018-01-12 10:45   ` Steven Whitehouse
@ 2018-01-15 15:48     ` Bob Peterson
  0 siblings, 0 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-15 15:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

----- Original Message -----
| Hi,
| 
| 
| On 10/01/18 20:42, Bob Peterson wrote:
| > Function gfs2_rgrp_congested is called in two places: once before
| > we acquire the rgrp glock, and once after. The first time, it checks
| > to see if the rgrp was used recently because the usage statistics
| > are only valid in short-term. For the second call, that check was
| > not done, but should have been, for the same reason.
| >
| > This patch simply moves function gfs2_rgrp_used_recently before
| > function gfs2_rgrp_congested, and it's now called from there so
| > it applies equally to both checks.
| I don't think this makes sense. The purpose of the
| gfs2_rgrp_used_recently test is that when we don't hold the glock we
| might not have up to date stats to make decisions from, so we check that
| we've held the lock recently enough that we can use historical
| statistics. Once we hold the lock however, then we can be sure that the
| stats are up to date, as the process of acquiring the lock will bring
| them up to date, so why repeat the test when we already have the lock?
| 
| Steve.

Hm. I'll admit it's been a long time since I revisited this patch set,
but you've got a valid point. This makes no sense. I'll pull this patch.

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire
  2018-01-12 10:51   ` Steven Whitehouse
@ 2018-01-15 16:00     ` Bob Peterson
  2018-01-15 16:24     ` Bob Peterson
  1 sibling, 0 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-15 16:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

----- Original Message -----
| Hi,
| 
| 
| On 10/01/18 20:42, Bob Peterson wrote:
| > Function fast_to_acquire determines if a rgrp ought to be fast to
| > acquire, but the logic was cryptic. This patch makes the logic more
| > straightforward:
| >
| > 1. If the rgrp is one of our "preferred" rgrps, consider it fast
| >     to acquire. This is how each node tries to confine its allocations
| >     to a set of rgrps unique from the other nodes, to reduce contention.
| > 2. If the rgrp glock is unlocked, consider it slow, because it will
| >     take some time to lock it, either for the DLM or for the glock
| >     state machine.
| > 3. If there are glock holders, consider it slow because we'd have to
| >     wait for another process to dequeue before we get our turn.
| > 4. If the glock is being demoted, consider it slow.
| >
| > Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| This appears to mean that for preferred rgrps we will no longer take
| into account any of the other conditions. I'm not sure that is right,
| since we want to avoid rgrps that are in use by other nodes, or are
| being demoted, even if they are preferred. The location of the DLM lock
| master is not really that critical, the slow thing is the I/O that is
| done under the glocks,
| 
| Steve.

My logic in doing this was that I didn't want any one node to become a
resource hog and render the "preferred rgrp" concept useless because over
time it had slowly assumed control over more and more rgrp glocks.

I envisioned cases in which a gfs2 fs, after having been mounted for
several weeks, might no longer give out block reservations fairly between
nodes: a persistent writer might migrate most (or all) of the rgrp
glocks to itself to the detriment of the others, and being "preferred"
might become a meaningless afterthought.

Still, for some use cases you might have a "work horse" node that does
the majority of writing, and other nodes that are more consumers,
and in that case, maybe it does make more sense to check "preferred" last.

Thoughts?

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire
  2018-01-12 10:51   ` Steven Whitehouse
  2018-01-15 16:00     ` Bob Peterson
@ 2018-01-15 16:24     ` Bob Peterson
  1 sibling, 0 replies; 15+ messages in thread
From: Bob Peterson @ 2018-01-15 16:24 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Looking at this again: Today's fast_to_acquire function checks:
(1) if the rgrp glock is locked, has no holders and is not being
demoted, it considers it fast to acquire.
(2) If the rgrp glock is "preferred," it STILL considers it fast to
acquire.

So with today's code, even if a "preferred" glock was being demoted,
for example, it would still consider it fast to acquire.

So the replacement patch's logic is not that different from what
we have today; it's just that the "preferred rgrps" case is now
checked first.

So is today's logic wrong then too?

Bob

----- Original Message -----
| Hi,
| 
| 
| On 10/01/18 20:42, Bob Peterson wrote:
| > Function fast_to_acquire determines if a rgrp ought to be fast to
| > acquire, but the logic was cryptic. This patch makes the logic more
| > straightforward:
| >
| > 1. If the rgrp is one of our "preferred" rgrps, consider it fast
| >     to acquire. This is how each node tries to confine its allocations
| >     to a set of rgrps unique from the other nodes, to reduce contention.
| > 2. If the rgrp glock is unlocked, consider it slow, because it will
| >     take some time to lock it, either for the DLM or for the glock
| >     state machine.
| > 3. If there are glock holders, consider it slow because we'd have to
| >     wait for another process to dequeue before we get our turn.
| > 4. If the glock is being demoted, consider it slow.
| >
| > Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| This appears to mean that for preferred rgrps we will no longer take
| into account any of the other conditions. I'm not sure that is right,
| since we want to avoid rgrps that are in use by other nodes, or are
| being demoted, even if they are preferred. The location of the DLM lock
| master is not really that critical, the slow thing is the I/O that is
| done under the glocks,
| 
| Steve.
| 
| > ---
| >   fs/gfs2/rgrp.c | 30 +++++++++++++++++++++---------
| >   1 file changed, 21 insertions(+), 9 deletions(-)
| >
| > diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
| > index 960fbc2d39d7..fa72e3e6ce4e 100644
| > --- a/fs/gfs2/rgrp.c
| > +++ b/fs/gfs2/rgrp.c
| > @@ -1938,20 +1938,32 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd
| > **pos, const struct gfs2_rgrpd *b
| >   /**
| >    * fast_to_acquire - determine if a resource group will be fast to
| >    acquire
| >    *
| > - * If this is one of our preferred rgrps, it should be quicker to acquire,
| > - * because we tried to set ourselves up as dlm lock master.
| > + * 1. If this is one of our preferred rgrps, it should be quicker to
| > acquire,
| > + *    because we tried to set ourselves up as dlm lock master.
| > + * 2. If the glock is unlocked, consider it slow because it will take time
| > for
| > + *    the dlm and the glock state machine to transition it to a locked
| > state.
| > + * 3. If there are glock holder records queued to the glock, consider it
| > slow
| > + *    because this process will need to be queued up behind another
| > process:
| > + *    Our request can't be enqueued until the other is dequeued.
| > + * 4. If the rgrp glock is being demoted, consider it slow because the
| > glock
| > + *    will need to be demoted, possibly used on another node, the
| > promoted,
| > + *    all of which will take time.
| > + *
| >    */
| > -static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
| > +static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
| >   {
| >   	struct gfs2_glock *gl = rgd->rd_gl;
| >   
| > -	if (gl->gl_state != LM_ST_UNLOCKED && list_empty(&gl->gl_holders) &&
| > -	    !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
| > -	    !test_bit(GLF_DEMOTE, &gl->gl_flags))
| > -		return 1;
| >   	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
| > -		return 1;
| > -	return 0;
| > +		return true;
| > +	if (gl->gl_state == LM_ST_UNLOCKED)
| > +		return false;
| > +	if (!list_empty(&gl->gl_holders))
| > +		return false;
| > +	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
| > +	    test_bit(GLF_DEMOTE, &gl->gl_flags))
| > +		return false;
| > +	return true;
| >   }
| >   
| >   /**
| 
| 



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-15 16:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-10 20:42 [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
2018-01-12 10:45   ` Steven Whitehouse
2018-01-15 15:48     ` Bob Peterson
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
2018-01-12 10:51   ` Steven Whitehouse
2018-01-15 16:00     ` Bob Peterson
2018-01-15 16:24     ` Bob Peterson
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
2018-01-12 10:55   ` Steven Whitehouse
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
2018-01-10 20:42 ` [Cluster-devel] [GFS2 PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
2018-01-12 11:11   ` Steven Whitehouse
2018-01-11 15:52 ` [Cluster-devel] [GFS2 PATCH 0/6] [resend] GFS2: Rework rgrp glock congestion functions for intra-node Abhijith Das

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.