[Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node
@ 2017-11-06 14:18 Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 02 November, I posted a patch for this, but Steve Whitehouse pointed out
a check that didn't look right. I tested it and found it to be unnecessary,
so I remove it.

Steve also pointed out that the patch tried to do too much. He asked which
part of the patch actually made the performance gains.

Based on Steve's suggestions, I've broken the changes into six patches for this
patchset. The first five are all refactoring to make the code simpler and
more easy to follow. Much of the complex special-case checks were consolidated
from the variety of locations to more centralized functions. The actual
performance gains are accomplished by the very last patch, "GFS2: Add checks
for intra-node congestion". So strictly speaking, the last patch is the only
patch we really need. But without the others, today's kludgy code becomes
even more kludgy and harder to read.

In doing all this, I did find a minor problem with the previous version's
logic, and fixed it.

I retested iozone performance and fragmentation. It is pretty much the same
as I documented for the previous patch:

Bob Peterson (6):
  GFS2: Simplify gfs2_rgrp_used_recently
  GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  GFS2: Refactor function fast_to_acquire
  GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
  GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases
  GFS2: Add checks for intra-node congestion

 fs/gfs2/rgrp.c | 176 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 134 insertions(+), 42 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch simplifies function gfs2_rgrp_used_recently. Now it uses
ktime functions to do its calculations. Also, it's called with the
rgrp rather than the reservation, and the constant HZ rather than
the hard-coded value 1000.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 95b2a57ded33..dabe4785efab 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1879,21 +1879,18 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 }
 
 /**
- * gfs2_rgrp_used_recently
- * @rs: The block reservation with the rgrp to test
+ * gfs2_rgrp_used_recently - check if a rgrp has been used recently
+ * @rgd: The rgrp to test
  * @msecs: The time limit in milliseconds
  *
  * Returns: True if the rgrp glock has been used within the time limit
  */
-static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
-				    u64 msecs)
+static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
+					   u64 msecs)
 {
-	u64 tdiff;
-
-	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
-                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
-
-	return tdiff > (msecs * 1000 * 1000);
+	return (ktime_before(ktime_get_real(),
+			     ktime_add(rgd->rd_gl->gl_dstamp,
+				       ms_to_ktime(msecs))));
 }
 
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
@@ -1993,7 +1990,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 				    !fast_to_acquire(rs->rs_rbm.rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
-				    gfs2_rgrp_used_recently(rs, 1000) &&
+				    gfs2_rgrp_used_recently(rs->rs_rbm.rgd, HZ) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
 			}
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function gfs2_rgrp_congested is called in two places: once before
we acquire the rgrp glock, and once after. The first time, it checks
to see if the rgrp was used recently because the usage statistics
are only valid in short-term. For the second call, that check was
not done, but should have been, for the same reason.

This patch simply moves function gfs2_rgrp_used_recently before
function gfs2_rgrp_congested, and it's now called from there so
it applies equally to both checks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index dabe4785efab..d26b5088651d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1806,6 +1806,21 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 }
 
 /**
+ * gfs2_rgrp_used_recently - check if a rgrp has been used recently
+ * @rgd: The rgrp to test
+ * @msecs: The time limit in milliseconds
+ *
+ * Returns: True if the rgrp glock has been used within the time limit
+ */
+static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
+					   u64 msecs)
+{
+	return (ktime_before(ktime_get_real(),
+			     ktime_add(rgd->rd_gl->gl_dstamp,
+				       ms_to_ktime(msecs))));
+}
+
+/**
  * gfs2_rgrp_congested - Use stats to figure out whether an rgrp is congested
  * @rgd: The rgrp in question
  * @loops: An indication of how picky we can be (0=very, 1=less so)
@@ -1844,6 +1859,11 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
+	/* If it hasn't been used recently we can't judge the statistics, so
+	   assume it's not congested. */
+	if (!gfs2_rgrp_used_recently(rgd, HZ))
+		return false;
+
 	preempt_disable();
 	for_each_present_cpu(cpu) {
 		st = &per_cpu_ptr(sdp->sd_lkstats, cpu)->lkstats[LM_TYPE_RGRP];
@@ -1878,21 +1898,6 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	return ((srttb_diff < 0) && (sqr_diff > var));
 }
 
-/**
- * gfs2_rgrp_used_recently - check if a rgrp has been used recently
- * @rgd: The rgrp to test
- * @msecs: The time limit in milliseconds
- *
- * Returns: True if the rgrp glock has been used within the time limit
- */
-static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
-					   u64 msecs)
-{
-	return (ktime_before(ktime_get_real(),
-			     ktime_add(rgd->rd_gl->gl_dstamp,
-				       ms_to_ktime(msecs))));
-}
-
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
 {
 	const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
@@ -1990,7 +1995,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 				    !fast_to_acquire(rs->rs_rbm.rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
-				    gfs2_rgrp_used_recently(rs->rs_rbm.rgd, HZ) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
 			}
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 3/6] GFS2: Refactor function fast_to_acquire
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function fast_to_acquire determines if a rgrp ought to be fast to
acquire, but the logic was cryptic. This patch makes the logic more
straightforward:

1. If the rgrp is one of our "preferred" rgrps, consider it fast
   to acquire. This is how each node tries to confine its allocations
   to a set of rgrps unique from the other nodes, to reduce contention.
2. If the rgrp glock is unlocked, consider it slow, because it will
   take some time to lock it, either for the DLM or for the glock
   state machine.
3. If there are glock holders, consider it slow because we'd have to
   wait for another process to dequeue before we get our turn.
4. If the glock is being demoted, consider it slow.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index d26b5088651d..e5c6aa6a8e15 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1924,20 +1924,32 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
 /**
  * fast_to_acquire - determine if a resource group will be fast to acquire
  *
- * If this is one of our preferred rgrps, it should be quicker to acquire,
- * because we tried to set ourselves up as dlm lock master.
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
  */
-static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
+static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
 {
 	struct gfs2_glock *gl = rgd->rd_gl;
 
-	if (gl->gl_state != LM_ST_UNLOCKED && list_empty(&gl->gl_holders) &&
-	    !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
-	    !test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return 1;
 	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return 1;
-	return 0;
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
 }
 
 /**
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (2 preceding siblings ...)
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch simply moves function fast_to_acquire to before function
gfs2_rgrp_congested, and checks it inside the function rather than
before calling it. As with function gfs2_rgrp_used_recently, function
fast_to_acquire was only called the first time gfs2_rgrp_congested
was used, but not the second, and it applies equally to both calls.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 68 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index e5c6aa6a8e15..f310f8f86e44 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1806,6 +1806,37 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 }
 
 /**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
+ */
+static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
+}
+
+/**
  * gfs2_rgrp_used_recently - check if a rgrp has been used recently
  * @rgd: The rgrp to test
  * @msecs: The time limit in milliseconds
@@ -1859,6 +1890,9 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
+	if (loops == 0 && !fast_to_acquire(rgd))
+		return true;
+
 	/* If it hasn't been used recently we can't judge the statistics, so
 	   assume it's not congested. */
 	if (!gfs2_rgrp_used_recently(rgd, HZ))
@@ -1922,37 +1956,6 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
 }
 
 /**
- * fast_to_acquire - determine if a resource group will be fast to acquire
- *
- * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
- *    because we tried to set ourselves up as dlm lock master.
- * 2. If the glock is unlocked, consider it slow because it will take time for
- *    the dlm and the glock state machine to transition it to a locked state.
- * 3. If there are glock holder records queued to the glock, consider it slow
- *    because this process will need to be queued up behind another process:
- *    Our request can't be enqueued until the other is dequeued.
- * 4. If the rgrp glock is being demoted, consider it slow because the glock
- *    will need to be demoted, possibly used on another node, the promoted,
- *    all of which will take time.
- *
- */
-static inline bool fast_to_acquire(struct gfs2_rgrpd *rgd)
-{
-	struct gfs2_glock *gl = rgd->rd_gl;
-
-	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return true;
-	if (gl->gl_state == LM_ST_UNLOCKED)
-		return false;
-	if (!list_empty(&gl->gl_holders))
-		return false;
-	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
-	    test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return false;
-	return true;
-}
-
-/**
  * gfs2_inplace_reserve - Reserve space in the filesystem
  * @ip: the inode to reserve space for
  * @ap: the allocation parameters
@@ -2003,9 +2006,6 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
-				if (loops == 0 &&
-				    !fast_to_acquire(rs->rs_rbm.rgd))
-					goto next_rgrp;
 				if ((loops < 2) &&
 				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
 					goto next_rgrp;
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (3 preceding siblings ...)
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_inplace_reserve did a bunch of
pre-checks based on loop counters and whether or not we have a
pre-existing reservation we must use. Then it called function
gfs2_rgrp_congested. But it only makes sense to check the rgrp
glock statistics if we're using a real locking protocol like dlm.
For lock_nolock, this is just a waste of time and can give false
positives.

This patch breaks function gfs2_rgrp_congested into its two cases:
inter-node and intra-node congestion. For intra-node congestion,
the logic is just simplified from what gfs2_inplace_reserve had
already done. The checks for inter-node congestion is moved to
a separate gfs2_rgrp_congested_dlm function.

So this basically stubs in a framework for doing more in-depth
checks for intra-node congestion, in a later patch.

Function fast_to_acquire is also moved to be closer to function
gfs2_rgrp_congested.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 128 ++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 85 insertions(+), 43 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index f310f8f86e44..117c43c43386 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1806,37 +1806,6 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 }
 
 /**
- * fast_to_acquire - determine if a resource group will be fast to acquire
- *
- * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
- *    because we tried to set ourselves up as dlm lock master.
- * 2. If the glock is unlocked, consider it slow because it will take time for
- *    the dlm and the glock state machine to transition it to a locked state.
- * 3. If there are glock holder records queued to the glock, consider it slow
- *    because this process will need to be queued up behind another process:
- *    Our request can't be enqueued until the other is dequeued.
- * 4. If the rgrp glock is being demoted, consider it slow because the glock
- *    will need to be demoted, possibly used on another node, the promoted,
- *    all of which will take time.
- *
- */
-static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
-{
-	struct gfs2_glock *gl = rgd->rd_gl;
-
-	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
-		return true;
-	if (gl->gl_state == LM_ST_UNLOCKED)
-		return false;
-	if (!list_empty(&gl->gl_holders))
-		return false;
-	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
-	    test_bit(GLF_DEMOTE, &gl->gl_flags))
-		return false;
-	return true;
-}
-
-/**
  * gfs2_rgrp_used_recently - check if a rgrp has been used recently
  * @rgd: The rgrp to test
  * @msecs: The time limit in milliseconds
@@ -1852,7 +1821,7 @@ static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
 }
 
 /**
- * gfs2_rgrp_congested - Use stats to figure out whether an rgrp is congested
+ * gfs2_rgrp_congested_dlm - Use stats to figure out if an rgrp is congested
  * @rgd: The rgrp in question
  * @loops: An indication of how picky we can be (0=very, 1=less so)
  *
@@ -1878,7 +1847,7 @@ static inline bool gfs2_rgrp_used_recently(const struct gfs2_rgrpd *rgd,
  * Returns: A boolean verdict on the congestion status
  */
 
-static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
+static bool gfs2_rgrp_congested_dlm(const struct gfs2_rgrpd *rgd, int loops)
 {
 	const struct gfs2_glock *gl = rgd->rd_gl;
 	const struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
@@ -1890,9 +1859,6 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	u64 var;
 	int cpu, nonzero = 0;
 
-	if (loops == 0 && !fast_to_acquire(rgd))
-		return true;
-
 	/* If it hasn't been used recently we can't judge the statistics, so
 	   assume it's not congested. */
 	if (!gfs2_rgrp_used_recently(rgd, HZ))
@@ -1932,6 +1898,86 @@ static bool gfs2_rgrp_congested(const struct gfs2_rgrpd *rgd, int loops)
 	return ((srttb_diff < 0) && (sqr_diff > var));
 }
 
+/**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * 1. If this is one of our preferred rgrps, it should be quicker to acquire,
+ *    because we tried to set ourselves up as dlm lock master.
+ * 2. If the glock is unlocked, consider it slow because it will take time for
+ *    the dlm and the glock state machine to transition it to a locked state.
+ * 3. If there are glock holder records queued to the glock, consider it slow
+ *    because this process will need to be queued up behind another process:
+ *    Our request can't be enqueued until the other is dequeued.
+ * 4. If the rgrp glock is being demoted, consider it slow because the glock
+ *    will need to be demoted, possibly used on another node, the promoted,
+ *    all of which will take time.
+ *
+ */
+static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (rgd->rd_flags & GFS2_RDF_PREFERRED)
+		return true;
+	if (gl->gl_state == LM_ST_UNLOCKED)
+		return false;
+	if (!list_empty(&gl->gl_holders))
+		return false;
+	if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) ||
+	    test_bit(GLF_DEMOTE, &gl->gl_flags))
+		return false;
+	return true;
+}
+
+/**
+ * gfs2_rgrp_congested - decide whether a rgrp glock is congested
+ * @rs: The reservation in question
+ * @loops: An indication of how picky we can be (0=very, 1=less so)
+ *
+ * There are two kinds of congestion: inter-node and intra-node.
+ *
+ * Inter-node congestion is where multiple nodes all want to allocate blocks
+ * inside the same rgrp, which means they need to trade the rgrp glock back
+ * and forth, which is a slow process. To mitigate this, we use glock
+ * statistics to predict whether the glock is historically fast to acquire.
+ *
+ * Intra-node congestion is where you have multiple processes on the same
+ * node, all trying to allocate blocks in the same rgrp. There's nothing wrong
+ * with doing so, but each process needs to wait for the other to release the
+ * rgrp glock before it may proceed.
+ *
+ * If we're not using inter-node locking (dlm) it doesn't make sense to check
+ * the glock statistics. Instead, we do some simple checks based on how
+ * desperate we are to get blocks (the number of loops).
+ *
+ * We know the number of loops we've been around, so we know how desperate we
+ * are to find something. On first loop, call it congested if anyone else has
+ * a block reservation. On second loop, call it congested if it's not fast to
+ * acquire.
+ */
+static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
+{
+	const struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
+
+	/* We already have a reservation, we need to use it regardless */
+	if (gfs2_rs_active(rs))
+		return false;
+
+	/* If we've rejected all the rgrps a few times, can no longer worry
+	   about whether the rgrp is congested. Fill in blocks where we can. */
+	if (loops >= 2)
+		return false;
+
+	if (loops == 0 && !fast_to_acquire(rgd))
+		return true;
+
+	/* Check for inter-node congestion */
+	if (rgd->rd_sbd->sd_lockstruct.ls_ops->lm_lock) /* lock_dlm */
+		return gfs2_rgrp_congested_dlm(rgd, loops);
+
+	return false;
+}
+
 static u32 gfs2_orlov_skip(const struct gfs2_inode *ip)
 {
 	const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
@@ -2005,18 +2051,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
-			if (!gfs2_rs_active(rs)) {
-				if ((loops < 2) &&
-				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
-					goto next_rgrp;
-			}
+			if (gfs2_rgrp_congested(rs, loops))
+				goto next_rgrp;
 			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &rs->rs_rgd_gh);
 			if (unlikely(error))
 				return error;
-			if (!gfs2_rs_active(rs) && (loops < 2) &&
-			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+			if (gfs2_rgrp_congested(rs, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rbm.rgd);
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 6/6] GFS2: Add checks for intra-node congestion
  2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
                   ` (4 preceding siblings ...)
  2017-11-06 14:18 ` [Cluster-devel] [PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
@ 2017-11-06 14:18 ` Bob Peterson
  5 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2017-11-06 14:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds logic to function gfs2_rgrp_congested to check for
intra-node congestion by calling new function other_rgrp_users
which checks whether there are other glock holders (done only before
we lock it ourselves) and whether there are other block reservations.
If there is a block reservation, we know it is for a different
inode: if it was for ours, we would have been forced to use it and
we would not be searching for a new one.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 117c43c43386..64954ad5abbf 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1930,9 +1930,36 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
 }
 
 /**
+ * other_rgrp_users - figure out if this rgrp has other users
+ * @rgd: The resource group
+ * @locked: true if we've already held the glock
+ *
+ * We're trying to figure out if the given rgrp has anybody competing for
+ * its free space. If other processes have enqueued its glock, there's a
+ * good chance of competition.
+ *
+ * If there are multi-block reservations for this rgrp, there's a good
+ * chance another process will lock the rgrp for block allocations soon.
+ *
+ * If we've already held the glock, we no longer care if there are holders
+ * because that's now a given (rgrp glocks are never shared).
+ */
+static inline bool other_rgrp_users(const struct gfs2_rgrpd *rgd, bool locked)
+{
+	struct gfs2_glock *gl = rgd->rd_gl;
+
+	if (!locked && !list_empty(&gl->gl_holders))
+		return true;
+	if (!RB_EMPTY_ROOT(&rgd->rd_rstree))
+		return true;
+	return false;
+}
+
+/**
  * gfs2_rgrp_congested - decide whether a rgrp glock is congested
  * @rs: The reservation in question
  * @loops: An indication of how picky we can be (0=very, 1=less so)
+ * @locked: Indicates if checks are before or after we've enqueued the glock.
  *
  * There are two kinds of congestion: inter-node and intra-node.
  *
@@ -1946,6 +1973,11 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
  * with doing so, but each process needs to wait for the other to release the
  * rgrp glock before it may proceed.
  *
+ * Both kinds of congestion can hurt performance, but it's faster to check
+ * intra-node, so we do that first. After all, why bother to check if we can
+ * get the glock quickly from DLM if other processes have also used that
+ * same reasoning.
+ *
  * If we're not using inter-node locking (dlm) it doesn't make sense to check
  * the glock statistics. Instead, we do some simple checks based on how
  * desperate we are to get blocks (the number of loops).
@@ -1955,7 +1987,8 @@ static inline bool fast_to_acquire(const struct gfs2_rgrpd *rgd)
  * a block reservation. On second loop, call it congested if it's not fast to
  * acquire.
  */
-static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
+static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops,
+				bool locked)
 {
 	const struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
 
@@ -1968,6 +2001,10 @@ static bool gfs2_rgrp_congested(const struct gfs2_blkreserv *rs, int loops)
 	if (loops >= 2)
 		return false;
 
+	/* Check for intra-node congestion */
+	if (loops == 0 && other_rgrp_users(rgd, locked))
+		return true;
+
 	if (loops == 0 && !fast_to_acquire(rgd))
 		return true;
 
@@ -2051,14 +2088,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
-			if (gfs2_rgrp_congested(rs, loops))
+			if (gfs2_rgrp_congested(rs, loops, false))
 				goto next_rgrp;
 			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &rs->rs_rgd_gh);
 			if (unlikely(error))
 				return error;
-			if (gfs2_rgrp_congested(rs, loops))
+			if (gfs2_rgrp_congested(rs, loops, true))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rbm.rgd);
-- 
2.13.6



^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-06 14:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 14:18 [Cluster-devel] [PATCH 0/6][rework] GFS2: Rework rgrp glock congestion functions for intra-node Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 1/6] GFS2: Simplify gfs2_rgrp_used_recently Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 2/6] GFS2: Check gfs2_rgrp_used_recently inside gfs2_rgrp_congested Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 3/6] GFS2: Refactor function fast_to_acquire Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 4/6] GFS2: Simplify by checking fast_to_acquire in gfs2_rgrp_congested Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 5/6] GFS2: Split gfs2_rgrp_congested into dlm and non-dlm cases Bob Peterson
2017-11-06 14:18 ` [Cluster-devel] [PATCH 6/6] GFS2: Add checks for intra-node congestion Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.