All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock
@ 2021-09-16 19:09 Bob Peterson
  2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock Bob Peterson
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This set of patches contains a few clean-ups and a patch to fix a
NULL Pointer dereference introduced by the new "node scope" patch
06e908cd9ead ("gfs2: Allow node-wide exclusive glock sharing").

Bob Peterson (6):
  gfs2: remove redundant check in gfs2_rgrp_go_lock
  gfs2: Add GL_SKIP holder flag to dump_holder
  gfs2: move GL_SKIP check from glops to do_promote
  gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug
  gfs2: simplify do_promote and fix promote trace
  gfs2: introduce and use new glops go_lock_needed

 fs/gfs2/glock.c  | 23 ++++++++++++++++-------
 fs/gfs2/glock.h  |  7 +++++++
 fs/gfs2/glops.c  | 16 +++++++++++++---
 fs/gfs2/incore.h |  1 +
 fs/gfs2/rgrp.c   | 41 +++++++++++++++++++++++++++++------------
 fs/gfs2/rgrp.h   |  1 +
 6 files changed, 67 insertions(+), 22 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
@ 2021-09-16 19:09 ` Bob Peterson
  2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 2/6] gfs2: Add GL_SKIP holder flag to dump_holder Bob Peterson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function gfs2_rgrp_go_lock checked if GL_SKIP and
ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if
ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes
the redundant check.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index c3b00ba92ed2..7a13a687e4f2 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1291,9 +1291,8 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
 {
 	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
-	struct gfs2_sbd *sdp = rgd->rd_sbd;
 
-	if (gh->gh_flags & GL_SKIP && sdp->sd_args.ar_rgrplvb)
+	if (gh->gh_flags & GL_SKIP)
 		return 0;
 	return gfs2_rgrp_bh_get(rgd);
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 2/6] gfs2: Add GL_SKIP holder flag to dump_holder
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
  2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock Bob Peterson
@ 2021-09-16 19:09 ` Bob Peterson
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 3/6] gfs2: move GL_SKIP check from glops to do_promote Bob Peterson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Somehow the GL_SKIP flag was missed when dumping glock holders.
This patch adds it to function hflags2str. I added it at the end because
I wanted Holder and Skip flags together to read "Hs" rather than "sH"
to avoid confusion with "Shared" ("SH") holder state.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e0eaa9cf9fb6..6144d7fe28e6 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2076,6 +2076,8 @@ static const char *hflags2str(char *buf, u16 flags, unsigned long iflags)
 		*p++ = 'H';
 	if (test_bit(HIF_WAIT, &iflags))
 		*p++ = 'W';
+	if (flags & GL_SKIP)
+		*p++ = 's';
 	*p = 0;
 	return buf;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 3/6] gfs2: move GL_SKIP check from glops to do_promote
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
  2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock Bob Peterson
  2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 2/6] gfs2: Add GL_SKIP holder flag to dump_holder Bob Peterson
@ 2021-09-16 19:10 ` Bob Peterson
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 4/6] gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug Bob Peterson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, each individual "go_lock" glock operation (glop)
checked the GL_SKIP flag, and if set, would skip further processing.

This patch changes the logic so the go_lock caller, function go_promote,
checks the GL_SKIP flag before calling the go_lock op in the first place.
This avoids having to unnecessarily unlock gl_lockref.lock only to
re-lock it again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 26 ++++++++++++++------------
 fs/gfs2/glops.c |  2 +-
 fs/gfs2/rgrp.c  |  2 --
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6144d7fe28e6..b8248ceff3c3 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -403,18 +403,20 @@ __acquires(&gl->gl_lockref.lock)
 		if (may_grant(gl, gh)) {
 			if (gh->gh_list.prev == &gl->gl_holders &&
 			    glops->go_lock) {
-				spin_unlock(&gl->gl_lockref.lock);
-				/* FIXME: eliminate this eventually */
-				ret = glops->go_lock(gh);
-				spin_lock(&gl->gl_lockref.lock);
-				if (ret) {
-					if (ret == 1)
-						return 2;
-					gh->gh_error = ret;
-					list_del_init(&gh->gh_list);
-					trace_gfs2_glock_queue(gh, 0);
-					gfs2_holder_wake(gh);
-					goto restart;
+				if (!(gh->gh_flags & GL_SKIP)) {
+					spin_unlock(&gl->gl_lockref.lock);
+					/* FIXME: eliminate this eventually */
+					ret = glops->go_lock(gh);
+					spin_lock(&gl->gl_lockref.lock);
+					if (ret) {
+						if (ret == 1)
+							return 2;
+						gh->gh_error = ret;
+						list_del_init(&gh->gh_list);
+						trace_gfs2_glock_queue(gh, 0);
+						gfs2_holder_wake(gh);
+						goto restart;
+					}
 				}
 				set_bit(HIF_HOLDER, &gh->gh_iflags);
 				trace_gfs2_promote(gh, 1);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 79c621c7863d..4b19f513570f 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -495,7 +495,7 @@ static int inode_go_lock(struct gfs2_holder *gh)
 	struct gfs2_inode *ip = gl->gl_object;
 	int error = 0;
 
-	if (!ip || (gh->gh_flags & GL_SKIP))
+	if (!ip)
 		return 0;
 
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7a13a687e4f2..1fb66f6e6a0c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1292,8 +1292,6 @@ int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
 {
 	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
 
-	if (gh->gh_flags & GL_SKIP)
-		return 0;
 	return gfs2_rgrp_bh_get(rgd);
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 4/6] gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
                   ` (2 preceding siblings ...)
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 3/6] gfs2: move GL_SKIP check from glops to do_promote Bob Peterson
@ 2021-09-16 19:10 ` Bob Peterson
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 5/6] gfs2: simplify do_promote and fix promote trace Bob Peterson
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed Bob Peterson
  5 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In rgrp.c there are several places where it does BUG_ON. This tells us
the call stack but nothing more, which is not very helpful.
This patch switches them to GLOCK_BUG_ON which also prints the glock,
its holders, and many of the rgrp values, which will help us debug
problems in the future.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1fb66f6e6a0c..96b2fbed6bf1 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1230,7 +1230,7 @@ static int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
 		rgrp_set_bitmap_flags(rgd);
 		rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
 		rgd->rd_free_clone = rgd->rd_free;
-		BUG_ON(rgd->rd_reserved);
+		GLOCK_BUG_ON(rgd->rd_gl, rgd->rd_reserved);
 		/* max out the rgrp allocation failure point */
 		rgd->rd_extfail_pt = rgd->rd_free;
 	}
@@ -1280,7 +1280,7 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	rgd->rd_free = be32_to_cpu(rgd->rd_rgl->rl_free);
 	rgrp_set_bitmap_flags(rgd);
 	rgd->rd_free_clone = rgd->rd_free;
-	BUG_ON(rgd->rd_reserved);
+	GLOCK_BUG_ON(rgd->rd_gl, rgd->rd_reserved);
 	/* max out the rgrp allocation failure point */
 	rgd->rd_extfail_pt = rgd->rd_free;
 	rgd->rd_dinodes = be32_to_cpu(rgd->rd_rgl->rl_dinodes);
@@ -2212,7 +2212,7 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 		struct gfs2_rgrpd *rgd = rs->rs_rgd;
 
 		spin_lock(&rgd->rd_rsspin);
-		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
+		GLOCK_BUG_ON(rgd->rd_gl, rgd->rd_reserved < rs->rs_reserved);
 		rgd->rd_reserved -= rs->rs_reserved;
 		spin_unlock(&rgd->rd_rsspin);
 		rs->rs_reserved = 0;
@@ -2473,9 +2473,9 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 		spin_unlock(&rbm.rgd->rd_rsspin);
 		goto rgrp_error;
 	}
-	BUG_ON(rbm.rgd->rd_reserved < *nblocks);
-	BUG_ON(rbm.rgd->rd_free_clone < *nblocks);
-	BUG_ON(rbm.rgd->rd_free < *nblocks);
+	GLOCK_BUG_ON(rbm.rgd->rd_gl, rbm.rgd->rd_reserved < *nblocks);
+	GLOCK_BUG_ON(rbm.rgd->rd_gl, rbm.rgd->rd_free_clone < *nblocks);
+	GLOCK_BUG_ON(rbm.rgd->rd_gl, rbm.rgd->rd_free < *nblocks);
 	rbm.rgd->rd_reserved -= *nblocks;
 	rbm.rgd->rd_free_clone -= *nblocks;
 	rbm.rgd->rd_free -= *nblocks;
@@ -2762,8 +2762,8 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
 
 void rgrp_lock_local(struct gfs2_rgrpd *rgd)
 {
-	BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
-	       !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
+	GLOCK_BUG_ON(rgd->rd_gl, !gfs2_glock_is_held_excl(rgd->rd_gl) &&
+		     !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
 	mutex_lock(&rgd->rd_mutex);
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 5/6] gfs2: simplify do_promote and fix promote trace
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
                   ` (3 preceding siblings ...)
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 4/6] gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug Bob Peterson
@ 2021-09-16 19:10 ` Bob Peterson
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed Bob Peterson
  5 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, the gfs2_promote kernel trace point would would only
record the "first" flag if the go_lock function was called. This patch
simplifies do_promote by eliminating the redundant code in do_promote
and fixes the trace point by adding a new gfs2_first_holder function.
This will also be used in a future patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 19 ++++++++++++-------
 fs/gfs2/glock.h |  7 +++++++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index b8248ceff3c3..4fcf340603e7 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -394,6 +394,7 @@ __acquires(&gl->gl_lockref.lock)
 {
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	struct gfs2_holder *gh, *tmp;
+	int first;
 	int ret;
 
 restart:
@@ -401,8 +402,8 @@ __acquires(&gl->gl_lockref.lock)
 		if (test_bit(HIF_HOLDER, &gh->gh_iflags))
 			continue;
 		if (may_grant(gl, gh)) {
-			if (gh->gh_list.prev == &gl->gl_holders &&
-			    glops->go_lock) {
+			first = gfs2_first_holder(gh);
+			if (first && glops->go_lock) {
 				if (!(gh->gh_flags & GL_SKIP)) {
 					spin_unlock(&gl->gl_lockref.lock);
 					/* FIXME: eliminate this eventually */
@@ -418,14 +419,18 @@ __acquires(&gl->gl_lockref.lock)
 						goto restart;
 					}
 				}
-				set_bit(HIF_HOLDER, &gh->gh_iflags);
-				trace_gfs2_promote(gh, 1);
-				gfs2_holder_wake(gh);
-				goto restart;
 			}
 			set_bit(HIF_HOLDER, &gh->gh_iflags);
-			trace_gfs2_promote(gh, 0);
+			trace_gfs2_promote(gh, first);
 			gfs2_holder_wake(gh);
+			/*
+			 * If this was the first holder, we may have released
+			 * the gl_lockref.lock, so the holders list may have
+			 * changed. For that reason, we start again at the
+			 * start of the holders queue.
+			 */
+			if (first)
+				goto restart;
 			continue;
 		}
 		if (gh->gh_list.prev == &gl->gl_holders)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 31a8f2f649b5..699c5e95006a 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -325,6 +325,13 @@ static inline void glock_clear_object(struct gfs2_glock *gl, void *object)
 	spin_unlock(&gl->gl_lockref.lock);
 }
 
+static inline bool gfs2_first_holder(struct gfs2_holder *gh)
+{
+	struct gfs2_glock *gl = gh->gh_gl;
+
+	return (gh->gh_list.prev == &gl->gl_holders);
+}
+
 extern void gfs2_inode_remember_delete(struct gfs2_glock *gl, u64 generation);
 extern bool gfs2_inode_already_deleted(struct gfs2_glock *gl, u64 generation);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed
  2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
                   ` (4 preceding siblings ...)
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 5/6] gfs2: simplify do_promote and fix promote trace Bob Peterson
@ 2021-09-16 19:10 ` Bob Peterson
  2021-09-22 11:57   ` Andreas Gruenbacher
  5 siblings, 1 reply; 10+ messages in thread
From: Bob Peterson @ 2021-09-16 19:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_lock glops function (if
one exists), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock, but node-scope introduced a new problem: if the
first holder has GL_SKIP and the next one does NOT, since it is not the
first holder on the queue, the go_lock op was not called. Eventually the
GL_SKIP holder may call the go_lock sub-function (e.g. gfs2_rgrp_bh_get)
but there's still a race in which another non-GL_SKIP holder assumes the
go_lock function was called by the first holder. In the case of rgrp
glocks, this leads to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing a new go_lock_needed
glops function: Now ALL callers who do not specify GL_SKIP should call
the go_lock_needed glops function to see if it should still be called.
This allows any holder (secondary, teriary, etc) to call the go_lock
function when needed.

However, this introduces a new race: Several node-scope EX holders could
all decide the lock needs go_lock, and call the go_lock function to read
in the buffers and operate on them. This can lead to situations in which
one process can call go_lock then create a reservation (rd_reserved+=)
but another process can do the same, then hit the gfs2_rgrp_bh_get
BUG_ON(rgd->rd_reserved) for the first holder's reservation.

To prevent this race, we hold the rgrp_lock_local during the rgrp_go_lock
function. The first caller will get the local lock, submit the IO
request and wait for it to complete. The second caller will wait for the
rgrp_local_lock, then gfs2_rgrp_bh_get will decide it no longer needs
to do the read, and continue on without penalty.

fixes: 06e908cd9ead ("gfs2: Allow node-wide exclusive glock sharing")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c  | 30 +++++++++++++++---------------
 fs/gfs2/glops.c  | 16 +++++++++++++---
 fs/gfs2/incore.h |  1 +
 fs/gfs2/rgrp.c   | 22 +++++++++++++++++++++-
 fs/gfs2/rgrp.h   |  1 +
 5 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4fcf340603e7..6dfd33dc206b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -403,21 +403,21 @@ __acquires(&gl->gl_lockref.lock)
 			continue;
 		if (may_grant(gl, gh)) {
 			first = gfs2_first_holder(gh);
-			if (first && glops->go_lock) {
-				if (!(gh->gh_flags & GL_SKIP)) {
-					spin_unlock(&gl->gl_lockref.lock);
-					/* FIXME: eliminate this eventually */
-					ret = glops->go_lock(gh);
-					spin_lock(&gl->gl_lockref.lock);
-					if (ret) {
-						if (ret == 1)
-							return 2;
-						gh->gh_error = ret;
-						list_del_init(&gh->gh_list);
-						trace_gfs2_glock_queue(gh, 0);
-						gfs2_holder_wake(gh);
-						goto restart;
-					}
+			if (!(gh->gh_flags & GL_SKIP) &&
+			    glops->go_lock_needed &&
+			    glops->go_lock_needed(gh)) {
+				spin_unlock(&gl->gl_lockref.lock);
+				/* FIXME: eliminate this eventually */
+				ret = glops->go_lock(gh);
+				spin_lock(&gl->gl_lockref.lock);
+				if (ret) {
+					if (ret == 1)
+						return 2;
+					gh->gh_error = ret;
+					list_del_init(&gh->gh_list);
+					trace_gfs2_glock_queue(gh, 0);
+					gfs2_holder_wake(gh);
+					goto restart;
 				}
 			}
 			set_bit(HIF_HOLDER, &gh->gh_iflags);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 4b19f513570f..e0fa8d7f96d3 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -481,6 +481,17 @@ int gfs2_inode_refresh(struct gfs2_inode *ip)
 	return error;
 }
 
+static bool inode_go_lock_needed(struct gfs2_holder *gh)
+{
+	struct gfs2_glock *gl = gh->gh_gl;
+
+	if (!gl->gl_object)
+		return false;
+	if (!gfs2_first_holder(gh))
+		return false;
+	return !(gh->gh_flags & GL_SKIP);
+}
+
 /**
  * inode_go_lock - operation done after an inode lock is locked by a process
  * @gh: The glock holder
@@ -495,9 +506,6 @@ static int inode_go_lock(struct gfs2_holder *gh)
 	struct gfs2_inode *ip = gl->gl_object;
 	int error = 0;
 
-	if (!ip)
-		return 0;
-
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
 		error = gfs2_inode_refresh(ip);
 		if (error)
@@ -740,6 +748,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 	.go_sync = inode_go_sync,
 	.go_inval = inode_go_inval,
 	.go_demote_ok = inode_go_demote_ok,
+	.go_lock_needed = inode_go_lock_needed,
 	.go_lock = inode_go_lock,
 	.go_dump = inode_go_dump,
 	.go_type = LM_TYPE_INODE,
@@ -750,6 +759,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 const struct gfs2_glock_operations gfs2_rgrp_glops = {
 	.go_sync = rgrp_go_sync,
 	.go_inval = rgrp_go_inval,
+	.go_lock_needed = gfs2_rgrp_go_lock_needed,
 	.go_lock = gfs2_rgrp_go_lock,
 	.go_dump = gfs2_rgrp_go_dump,
 	.go_type = LM_TYPE_RGRP,
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0fe49770166e..dc5c9dccb060 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -225,6 +225,7 @@ struct gfs2_glock_operations {
 			const char *fs_id_buf);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
 	void (*go_free)(struct gfs2_glock *gl);
+	bool (*go_lock_needed)(struct gfs2_holder *gh);
 	const int go_subclass;
 	const int go_type;
 	const unsigned long go_flags;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 96b2fbed6bf1..9848c5f4fbc4 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1288,11 +1288,31 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	return 0;
 }
 
+bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh)
+{
+	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
+
+	if (gh->gh_flags & GL_SKIP)
+		return false;
+
+	if (rgd->rd_bits[0].bi_bh)
+		return false;
+	return true;
+}
+
 int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
 {
+	int ret;
+
 	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
 
-	return gfs2_rgrp_bh_get(rgd);
+	if (gfs2_glock_is_held_excl(rgd->rd_gl))
+		rgrp_lock_local(rgd);
+	ret = gfs2_rgrp_bh_get(rgd);
+	if (gfs2_glock_is_held_excl(rgd->rd_gl))
+		rgrp_unlock_local(rgd);
+
+	return ret;
 }
 
 /**
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index a6855fd796e0..4b62ba5d8e20 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -31,6 +31,7 @@ extern struct gfs2_rgrpd *gfs2_rgrpd_get_next(struct gfs2_rgrpd *rgd);
 extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
 extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
+extern bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh);
 extern int gfs2_rgrp_go_lock(struct gfs2_holder *gh);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed
  2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed Bob Peterson
@ 2021-09-22 11:57   ` Andreas Gruenbacher
  2021-09-22 12:47     ` Bob Peterson
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Gruenbacher @ 2021-09-22 11:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Sep 16, 2021 at 9:11 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch, when a glock was locked, the very first holder on the
> queue would unlock the lockref and call the go_lock glops function (if
> one exists), unless GL_SKIP was specified. When we introduced the new
> node-scope concept, we allowed multiple holders to lock glocks in EX mode
> and share the lock, but node-scope introduced a new problem: if the
> first holder has GL_SKIP and the next one does NOT, since it is not the
> first holder on the queue, the go_lock op was not called.

We use go_lock to (re)validate inodes (for inode glocks) and to read
in bitmaps (for resource group glocks). I can see how calling go_lock
was originally tied to the first lock holder, but GL_SKIP already
broke the simple model that the first holder will call go_lock. The
go_lock_needed callback only makes things worse yet again,
unfortunately.

How about we introduce a new GLF_REVALIDATE flag that indicates that
go_lock needs to be called? The flag would be set when instantiating a
new glock and when dequeuing the last holder, and cleared in go_lock
(and in gfs2_inode_refresh for GL_SKIP holders). I'm not sure if
GLF_REVALIDATE can fully replace GIF_INVALID as well, but it looks
like it at first glance.

Thanks,
Andreas

> Eventually the
> GL_SKIP holder may call the go_lock sub-function (e.g. gfs2_rgrp_bh_get)
> but there's still a race in which another non-GL_SKIP holder assumes the
> go_lock function was called by the first holder. In the case of rgrp
> glocks, this leads to a NULL pointer dereference on the buffer_heads.
>
> This patch tries to fix the problem by introducing a new go_lock_needed
> glops function: Now ALL callers who do not specify GL_SKIP should call
> the go_lock_needed glops function to see if it should still be called.
> This allows any holder (secondary, teriary, etc) to call the go_lock
> function when needed.
>
> However, this introduces a new race: Several node-scope EX holders could
> all decide the lock needs go_lock, and call the go_lock function to read
> in the buffers and operate on them. This can lead to situations in which
> one process can call go_lock then create a reservation (rd_reserved+=)
> but another process can do the same, then hit the gfs2_rgrp_bh_get
> BUG_ON(rgd->rd_reserved) for the first holder's reservation.
>
> To prevent this race, we hold the rgrp_lock_local during the rgrp_go_lock
> function. The first caller will get the local lock, submit the IO
> request and wait for it to complete. The second caller will wait for the
> rgrp_local_lock, then gfs2_rgrp_bh_get will decide it no longer needs
> to do the read, and continue on without penalty.
>
> fixes: 06e908cd9ead ("gfs2: Allow node-wide exclusive glock sharing")
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/glock.c  | 30 +++++++++++++++---------------
>  fs/gfs2/glops.c  | 16 +++++++++++++---
>  fs/gfs2/incore.h |  1 +
>  fs/gfs2/rgrp.c   | 22 +++++++++++++++++++++-
>  fs/gfs2/rgrp.h   |  1 +
>  5 files changed, 51 insertions(+), 19 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4fcf340603e7..6dfd33dc206b 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -403,21 +403,21 @@ __acquires(&gl->gl_lockref.lock)
>                         continue;
>                 if (may_grant(gl, gh)) {
>                         first = gfs2_first_holder(gh);
> -                       if (first && glops->go_lock) {
> -                               if (!(gh->gh_flags & GL_SKIP)) {
> -                                       spin_unlock(&gl->gl_lockref.lock);
> -                                       /* FIXME: eliminate this eventually */
> -                                       ret = glops->go_lock(gh);
> -                                       spin_lock(&gl->gl_lockref.lock);
> -                                       if (ret) {
> -                                               if (ret == 1)
> -                                                       return 2;
> -                                               gh->gh_error = ret;
> -                                               list_del_init(&gh->gh_list);
> -                                               trace_gfs2_glock_queue(gh, 0);
> -                                               gfs2_holder_wake(gh);
> -                                               goto restart;
> -                                       }
> +                       if (!(gh->gh_flags & GL_SKIP) &&
> +                           glops->go_lock_needed &&
> +                           glops->go_lock_needed(gh)) {
> +                               spin_unlock(&gl->gl_lockref.lock);
> +                               /* FIXME: eliminate this eventually */
> +                               ret = glops->go_lock(gh);
> +                               spin_lock(&gl->gl_lockref.lock);
> +                               if (ret) {
> +                                       if (ret == 1)
> +                                               return 2;
> +                                       gh->gh_error = ret;
> +                                       list_del_init(&gh->gh_list);
> +                                       trace_gfs2_glock_queue(gh, 0);
> +                                       gfs2_holder_wake(gh);
> +                                       goto restart;
>                                 }
>                         }
>                         set_bit(HIF_HOLDER, &gh->gh_iflags);
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 4b19f513570f..e0fa8d7f96d3 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -481,6 +481,17 @@ int gfs2_inode_refresh(struct gfs2_inode *ip)
>         return error;
>  }
>
> +static bool inode_go_lock_needed(struct gfs2_holder *gh)
> +{
> +       struct gfs2_glock *gl = gh->gh_gl;
> +
> +       if (!gl->gl_object)
> +               return false;
> +       if (!gfs2_first_holder(gh))
> +               return false;
> +       return !(gh->gh_flags & GL_SKIP);
> +}
> +
>  /**
>   * inode_go_lock - operation done after an inode lock is locked by a process
>   * @gh: The glock holder
> @@ -495,9 +506,6 @@ static int inode_go_lock(struct gfs2_holder *gh)
>         struct gfs2_inode *ip = gl->gl_object;
>         int error = 0;
>
> -       if (!ip)
> -               return 0;
> -
>         if (test_bit(GIF_INVALID, &ip->i_flags)) {
>                 error = gfs2_inode_refresh(ip);
>                 if (error)
> @@ -740,6 +748,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
>         .go_sync = inode_go_sync,
>         .go_inval = inode_go_inval,
>         .go_demote_ok = inode_go_demote_ok,
> +       .go_lock_needed = inode_go_lock_needed,
>         .go_lock = inode_go_lock,
>         .go_dump = inode_go_dump,
>         .go_type = LM_TYPE_INODE,
> @@ -750,6 +759,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
>  const struct gfs2_glock_operations gfs2_rgrp_glops = {
>         .go_sync = rgrp_go_sync,
>         .go_inval = rgrp_go_inval,
> +       .go_lock_needed = gfs2_rgrp_go_lock_needed,
>         .go_lock = gfs2_rgrp_go_lock,
>         .go_dump = gfs2_rgrp_go_dump,
>         .go_type = LM_TYPE_RGRP,
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 0fe49770166e..dc5c9dccb060 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -225,6 +225,7 @@ struct gfs2_glock_operations {
>                         const char *fs_id_buf);
>         void (*go_callback)(struct gfs2_glock *gl, bool remote);
>         void (*go_free)(struct gfs2_glock *gl);
> +       bool (*go_lock_needed)(struct gfs2_holder *gh);
>         const int go_subclass;
>         const int go_type;
>         const unsigned long go_flags;
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 96b2fbed6bf1..9848c5f4fbc4 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1288,11 +1288,31 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
>         return 0;
>  }
>
> +bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh)
> +{
> +       struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
> +
> +       if (gh->gh_flags & GL_SKIP)
> +               return false;
> +
> +       if (rgd->rd_bits[0].bi_bh)
> +               return false;
> +       return true;
> +}
> +
>  int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
>  {
> +       int ret;
> +
>         struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
>
> -       return gfs2_rgrp_bh_get(rgd);
> +       if (gfs2_glock_is_held_excl(rgd->rd_gl))
> +               rgrp_lock_local(rgd);
> +       ret = gfs2_rgrp_bh_get(rgd);
> +       if (gfs2_glock_is_held_excl(rgd->rd_gl))
> +               rgrp_unlock_local(rgd);
> +
> +       return ret;
>  }
>
>  /**
> diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
> index a6855fd796e0..4b62ba5d8e20 100644
> --- a/fs/gfs2/rgrp.h
> +++ b/fs/gfs2/rgrp.h
> @@ -31,6 +31,7 @@ extern struct gfs2_rgrpd *gfs2_rgrpd_get_next(struct gfs2_rgrpd *rgd);
>  extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
>  extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
>  extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
> +extern bool gfs2_rgrp_go_lock_needed(struct gfs2_holder *gh);
>  extern int gfs2_rgrp_go_lock(struct gfs2_holder *gh);
>  extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
>
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed
  2021-09-22 11:57   ` Andreas Gruenbacher
@ 2021-09-22 12:47     ` Bob Peterson
  2021-09-22 13:54       ` Andreas Gruenbacher
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Peterson @ 2021-09-22 12:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 9/22/21 6:57 AM, Andreas Gruenbacher wrote:
> On Thu, Sep 16, 2021 at 9:11 PM Bob Peterson <rpeterso@redhat.com> wrote:
>> Before this patch, when a glock was locked, the very first holder on the
>> queue would unlock the lockref and call the go_lock glops function (if
>> one exists), unless GL_SKIP was specified. When we introduced the new
>> node-scope concept, we allowed multiple holders to lock glocks in EX mode
>> and share the lock, but node-scope introduced a new problem: if the
>> first holder has GL_SKIP and the next one does NOT, since it is not the
>> first holder on the queue, the go_lock op was not called.
> 
> We use go_lock to (re)validate inodes (for inode glocks) and to read
> in bitmaps (for resource group glocks). I can see how calling go_lock
> was originally tied to the first lock holder, but GL_SKIP already
> broke the simple model that the first holder will call go_lock. The
> go_lock_needed callback only makes things worse yet again,
> unfortunately.

In what way does go_lock_needed make things worse?

> How about we introduce a new GLF_REVALIDATE flag that indicates that
> go_lock needs to be called? The flag would be set when instantiating a
> new glock and when dequeuing the last holder, and cleared in go_lock
> (and in gfs2_inode_refresh for GL_SKIP holders). I'm not sure if

That was my original design, and it makes the most sense. I named the 
flag GLF_GO_LOCK_SKIPPED, but essentially the same thing. Unfortunately, 
I ran into all kinds of problems implementing it. In those patches, 
first holders would either call glops->go_lock() or set 
GLF_GO_LOCK_SKIPPED. Once the go_lock function was complete, it cleared 
GLF_GO_LOCK_SKIPPED, and called wake_up_bit. Secondary holders did 
wait_on_bit and waited for the other process's go_lock to complete.

But I had tons of problems getting this to work properly. Processes 
would hang and deadlock for seemingly no reason. Finally I got 
frustrated and sought other solutions.

I'm willing to try to resurrect that patch set and try again. Maybe you 
can help me figure out what I'm doing wrong and why it's not working.

Bob Peterson

> GLF_REVALIDATE can fully replace GIF_INVALID as well, but it looks
> like it at first glance.
> 
> Thanks,
> Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed
  2021-09-22 12:47     ` Bob Peterson
@ 2021-09-22 13:54       ` Andreas Gruenbacher
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2021-09-22 13:54 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Sep 22, 2021 at 2:47 PM Bob Peterson <rpeterso@redhat.com> wrote:
> On 9/22/21 6:57 AM, Andreas Gruenbacher wrote:
> > On Thu, Sep 16, 2021 at 9:11 PM Bob Peterson <rpeterso@redhat.com> wrote:
> >> Before this patch, when a glock was locked, the very first holder on the
> >> queue would unlock the lockref and call the go_lock glops function (if
> >> one exists), unless GL_SKIP was specified. When we introduced the new
> >> node-scope concept, we allowed multiple holders to lock glocks in EX mode
> >> and share the lock, but node-scope introduced a new problem: if the
> >> first holder has GL_SKIP and the next one does NOT, since it is not the
> >> first holder on the queue, the go_lock op was not called.
> >
> > We use go_lock to (re)validate inodes (for inode glocks) and to read
> > in bitmaps (for resource group glocks). I can see how calling go_lock
> > was originally tied to the first lock holder, but GL_SKIP already
> > broke the simple model that the first holder will call go_lock. The
> > go_lock_needed callback only makes things worse yet again,
> > unfortunately.
>
> In what way does go_lock_needed make things worse?

It adds an indirection that papers over the fact that the existing
abstraction (first holder calls go_lock) doesn't make sense.

> > How about we introduce a new GLF_REVALIDATE flag that indicates that
> > go_lock needs to be called? The flag would be set when instantiating a
> > new glock and when dequeuing the last holder, and cleared in go_lock
> > (and in gfs2_inode_refresh for GL_SKIP holders). I'm not sure if
>
> That was my original design, and it makes the most sense. I named the
> flag GLF_GO_LOCK_SKIPPED, but essentially the same thing. Unfortunately,
> I ran into all kinds of problems implementing it. In those patches,
> first holders would either call glops->go_lock() or set
> GLF_GO_LOCK_SKIPPED. Once the go_lock function was complete, it cleared
> GLF_GO_LOCK_SKIPPED, and called wake_up_bit. Secondary holders did
> wait_on_bit and waited for the other process's go_lock to complete.

Just set the flag when we know the glock needs revalidation. There are
two possible points in time for doing that: either when we're locking
the first holder, or when the glock is new / the last holder is
dequeued. Then, we can handle clearing the flag and races among
multiple go_lock instances in the go_lock handlers.

> But I had tons of problems getting this to work properly. Processes
> would hang and deadlock for seemingly no reason. Finally I got
> frustrated and sought other solutions.
>
> I'm willing to try to resurrect that patch set and try again. Maybe you
> can help me figure out what I'm doing wrong and why it's not working.
>
> Bob Peterson
>
> > GLF_REVALIDATE can fully replace GIF_INVALID as well, but it looks
> > like it at first glance.
> >
> > Thanks,
> > Andreas
>

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-22 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16 19:09 [Cluster-devel] [GFS2 PATCH v2 0/6] gfs2: fix bugs related to node_scope and go_lock Bob Peterson
2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 1/6] gfs2: remove redundant check in gfs2_rgrp_go_lock Bob Peterson
2021-09-16 19:09 ` [Cluster-devel] [GFS2 PATCH v2 2/6] gfs2: Add GL_SKIP holder flag to dump_holder Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 3/6] gfs2: move GL_SKIP check from glops to do_promote Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 4/6] gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 5/6] gfs2: simplify do_promote and fix promote trace Bob Peterson
2021-09-16 19:10 ` [Cluster-devel] [GFS2 PATCH v2 6/6] gfs2: introduce and use new glops go_lock_needed Bob Peterson
2021-09-22 11:57   ` Andreas Gruenbacher
2021-09-22 12:47     ` Bob Peterson
2021-09-22 13:54       ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.