[Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing
@ 2018-10-05 19:18 Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here is a set of patches that are meant to prepare gfs2 for Bob's
resource group glock sharing patches.  The patches have only survived
light testing so far, and Bob's remaining patches haven't been ported in
top of this patch queue yet, so I'm mainly looking for feedback on the
mode of operation of these patches at this point.

Thanks,
Andreas

Andreas Gruenbacher (9):
  gfs2: Always check the result of gfs2_rbm_from_block
  gfs2: Move rs_{sizehint, rgd_gh} fields into the inode
  gfs2: Remove unused RGRP_RSRV_MINBYTES definition
  gfs2: Rename bitmap.bi_{len => bytes}
  gfs2: Fix some minor typos
  gfs2: Only use struct gfs2_rbm for bitmap manipulations
  gfs2: Fix marking bitmaps non-full
  gfs2: Add per-reservation reserved block accounting
  gfs2: Pass resource group to rgblk_free

Bob Peterson (2):
  gfs2: Remove unnecessary gfs2_rlist_alloc parameter
  gfs2: Add local resource group locking

 fs/gfs2/bmap.c       |   6 +-
 fs/gfs2/dir.c        |   7 +-
 fs/gfs2/file.c       |   8 +-
 fs/gfs2/incore.h     |  42 +----
 fs/gfs2/lops.c       |   7 +-
 fs/gfs2/main.c       |   2 +
 fs/gfs2/quota.c      |   2 +-
 fs/gfs2/rgrp.c       | 412 +++++++++++++++++++++++++------------------
 fs/gfs2/rgrp.h       |  15 +-
 fs/gfs2/trace_gfs2.h |  18 +-
 fs/gfs2/trans.h      |   2 +-
 fs/gfs2/xattr.c      |  18 +-
 12 files changed, 304 insertions(+), 235 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:12   ` Steven Whitehouse
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode Andreas Gruenbacher
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When gfs2_rbm_from_block fails, the rbm it returns is undefined, so we
always want to make sure gfs2_rbm_from_block has succeeded before
looking at the rbm.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index fc181c81cca2..c9caddc2627c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2227,7 +2227,7 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
 		return NULL;
 	}
 
-	gfs2_rbm_from_block(&rbm, bstart);
+	BUG_ON(gfs2_rbm_from_block(&rbm, bstart));
 	while (blen--) {
 		bi = rbm_bi(&rbm);
 		if (bi != bi_prev) {
@@ -2360,7 +2360,7 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
 	else
 		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
 
-	gfs2_rbm_from_block(rbm, goal);
+	BUG_ON(gfs2_rbm_from_block(rbm, goal));
 }
 
 /**
@@ -2569,7 +2569,8 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
 
 	rbm.rgd = rgd;
 	error = gfs2_rbm_from_block(&rbm, no_addr);
-	WARN_ON_ONCE(error != 0);
+	if (WARN_ON_ONCE(error))
+		goto fail;
 
 	if (gfs2_testbit(&rbm, false) != type)
 		error = -ESTALE;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:34   ` Steven Whitehouse
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 03/11] gfs2: Remove unused RGRP_RSRV_MINBYTES definition Andreas Gruenbacher
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv
into the inode: they are more closely related to the inode than to a
particular reservation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c   |  4 ++--
 fs/gfs2/incore.h |  6 ++----
 fs/gfs2/main.c   |  2 ++
 fs/gfs2/rgrp.c   | 16 +++++++---------
 4 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 08369c6cd127..e8864ff2ed03 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -347,8 +347,8 @@ static void gfs2_size_hint(struct file *filep, loff_t offset, size_t size)
 	size_t blks = (size + sdp->sd_sb.sb_bsize - 1) >> sdp->sd_sb.sb_bsize_shift;
 	int hint = min_t(size_t, INT_MAX, blks);
 
-	if (hint > atomic_read(&ip->i_res.rs_sizehint))
-		atomic_set(&ip->i_res.rs_sizehint, hint);
+	if (hint > atomic_read(&ip->i_sizehint))
+		atomic_set(&ip->i_sizehint, hint);
 }
 
 /**
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b96d39c28e17..9d7d9bd8c3a9 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -309,10 +309,6 @@ struct gfs2_qadata { /* quota allocation data */
 */
 
 struct gfs2_blkreserv {
-	/* components used during write (step 1): */
-	atomic_t rs_sizehint;         /* hint of the write size */
-
-	struct gfs2_holder rs_rgd_gh; /* Filled in by get_local_rgrp */
 	struct rb_node rs_node;       /* link to other block reservations */
 	struct gfs2_rbm rs_rbm;       /* Start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
@@ -417,8 +413,10 @@ struct gfs2_inode {
 	struct gfs2_holder i_iopen_gh;
 	struct gfs2_holder i_gh; /* for prepare/commit_write only */
 	struct gfs2_qadata *i_qadata; /* quota allocation data */
+	struct gfs2_holder i_rgd_gh;
 	struct gfs2_blkreserv i_res; /* rgrp multi-block reservation */
 	u64 i_goal;	/* goal block for allocations */
+	atomic_t i_sizehint;  /* hint of the write size */
 	struct rw_semaphore i_rw_mutex;
 	struct list_head i_ordered;
 	struct list_head i_trunc_list;
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 2d55e2c3333c..c7603063f861 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -39,9 +39,11 @@ static void gfs2_init_inode_once(void *foo)
 	struct gfs2_inode *ip = foo;
 
 	inode_init_once(&ip->i_inode);
+	atomic_set(&ip->i_sizehint, 0);
 	init_rwsem(&ip->i_rw_mutex);
 	INIT_LIST_HEAD(&ip->i_trunc_list);
 	ip->i_qadata = NULL;
+	gfs2_holder_mark_uninitialized(&ip->i_rgd_gh);
 	memset(&ip->i_res, 0, sizeof(ip->i_res));
 	RB_CLEAR_NODE(&ip->i_res.rs_node);
 	ip->i_hash_cache = NULL;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index c9caddc2627c..34122c546576 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1564,7 +1564,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 	if (S_ISDIR(inode->i_mode))
 		extlen = 1;
 	else {
-		extlen = max_t(u32, atomic_read(&rs->rs_sizehint), ap->target);
+		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
 		extlen = clamp(extlen, RGRP_RSRV_MINBLKS, free_blocks);
 	}
 	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
@@ -2076,7 +2076,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			}
 			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
-						   &rs->rs_rgd_gh);
+						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
@@ -2085,7 +2085,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rbm.rgd);
 				if (unlikely(error)) {
-					gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
+					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
 				}
 			}
@@ -2128,7 +2128,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 		/* Unlock rgrp if required */
 		if (!rg_locked)
-			gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
+			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 next_rgrp:
 		/* Find the next rgrp, and continue looking */
 		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
@@ -2165,10 +2165,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 void gfs2_inplace_release(struct gfs2_inode *ip)
 {
-	struct gfs2_blkreserv *rs = &ip->i_res;
-
-	if (gfs2_holder_initialized(&rs->rs_rgd_gh))
-		gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
+	if (gfs2_holder_initialized(&ip->i_rgd_gh))
+		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 }
 
 /**
@@ -2326,7 +2324,7 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 				goto out;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
-			atomic_add(RGRP_RSRV_ADDBLKS, &rs->rs_sizehint);
+			atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
 		}
 		__rs_deltree(rs);
 	}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 03/11] gfs2: Remove unused RGRP_RSRV_MINBYTES definition
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 04/11] gfs2: Rename bitmap.bi_{len => bytes} Andreas Gruenbacher
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This definition is only used to define RGRP_RSRV_MINBLKS, with no
benefit over defining RGRP_RSRV_MINBLKS directly.

In addition, instead of forcing RGRP_RSRV_MINBLKS to be of type u32,
cast it to that type where that type is required.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 2 +-
 fs/gfs2/rgrp.h | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 34122c546576..52e5a0f24c9f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1565,7 +1565,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 		extlen = 1;
 	else {
 		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
-		extlen = clamp(extlen, RGRP_RSRV_MINBLKS, free_blocks);
+		extlen = clamp(extlen, (u32)RGRP_RSRV_MINBLKS, free_blocks);
 	}
 	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
 		return;
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index e90478e2f545..6bb5ee112324 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -18,8 +18,7 @@
  * By reserving 32 blocks at a time, we can optimize / shortcut how we search
  * through the bitmaps by looking a word@a time.
  */
-#define RGRP_RSRV_MINBYTES 8
-#define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY))
+#define RGRP_RSRV_MINBLKS 32
 #define RGRP_RSRV_ADDBLKS 64
 
 struct gfs2_rgrpd;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 04/11] gfs2: Rename bitmap.bi_{len => bytes}
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 03/11] gfs2: Remove unused RGRP_RSRV_MINBYTES definition Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 05/11] gfs2: Fix some minor typos Andreas Gruenbacher
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This field indicates the size of the bitmap in bytes, similar to how the
bi_blocks field indicates the size of the bitmap in blocks.

In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by
bi_blocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h |  2 +-
 fs/gfs2/lops.c   |  2 +-
 fs/gfs2/rgrp.c   | 32 ++++++++++++++++----------------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 9d7d9bd8c3a9..a1771d8a93be 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -92,7 +92,7 @@ struct gfs2_bitmap {
 	unsigned long bi_flags;
 	u32 bi_offset;
 	u32 bi_start;
-	u32 bi_len;
+	u32 bi_bytes;
 	u32 bi_blocks;
 };
 
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index f2567f958d00..4c7069b8f3c1 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -81,7 +81,7 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	if (sdp->sd_args.ar_discard)
 		gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
 	memcpy(bi->bi_clone + bi->bi_offset,
-	       bd->bd_bh->b_data + bi->bi_offset, bi->bi_len);
+	       bd->bd_bh->b_data + bi->bi_offset, bi->bi_bytes);
 	clear_bit(GBF_FULL, &bi->bi_flags);
 	rgd->rd_free_clone = rgd->rd_free;
 	rgd->rd_extfail_pt = rgd->rd_free;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 52e5a0f24c9f..aa96fd32eaf1 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -90,7 +90,7 @@ static inline void gfs2_setbit(const struct gfs2_rbm *rbm, bool do_clone,
 {
 	unsigned char *byte1, *byte2, *end, cur_state;
 	struct gfs2_bitmap *bi = rbm_bi(rbm);
-	unsigned int buflen = bi->bi_len;
+	unsigned int buflen = bi->bi_bytes;
 	const unsigned int bit = (rbm->offset % GFS2_NBBY) * GFS2_BIT_SIZE;
 
 	byte1 = bi->bi_bh->b_data + bi->bi_offset + (rbm->offset / GFS2_NBBY);
@@ -105,8 +105,8 @@ static inline void gfs2_setbit(const struct gfs2_rbm *rbm, bool do_clone,
 			rbm->offset, cur_state, new_state);
 		pr_warn("rgrp=0x%llx bi_start=0x%x\n",
 			(unsigned long long)rbm->rgd->rd_addr, bi->bi_start);
-		pr_warn("bi_offset=0x%x bi_len=0x%x\n",
-			bi->bi_offset, bi->bi_len);
+		pr_warn("bi_offset=0x%x bi_bytes=0x%x\n",
+			bi->bi_offset, bi->bi_bytes);
 		dump_stack();
 		gfs2_consist_rgrpd(rbm->rgd);
 		return;
@@ -382,7 +382,7 @@ static u32 gfs2_free_extlen(const struct gfs2_rbm *rrbm, u32 len)
 		if (bi->bi_clone)
 			start = bi->bi_clone;
 		start += bi->bi_offset;
-		end = start + bi->bi_len;
+		end = start + bi->bi_bytes;
 		BUG_ON(rbm.offset & 3);
 		start += (rbm.offset / GFS2_NBBY);
 		bytes = min_t(u32, len / GFS2_NBBY, (end - start));
@@ -467,7 +467,7 @@ void gfs2_rgrp_verify(struct gfs2_rgrpd *rgd)
 			count[x] += gfs2_bitcount(rgd,
 						  bi->bi_bh->b_data +
 						  bi->bi_offset,
-						  bi->bi_len, x);
+						  bi->bi_bytes, x);
 	}
 
 	if (count[0] != rgd->rd_free) {
@@ -780,21 +780,21 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd)
 			bytes = bytes_left;
 			bi->bi_offset = sizeof(struct gfs2_rgrp);
 			bi->bi_start = 0;
-			bi->bi_len = bytes;
+			bi->bi_bytes = bytes;
 			bi->bi_blocks = bytes * GFS2_NBBY;
 		/* header block */
 		} else if (x == 0) {
 			bytes = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_rgrp);
 			bi->bi_offset = sizeof(struct gfs2_rgrp);
 			bi->bi_start = 0;
-			bi->bi_len = bytes;
+			bi->bi_bytes = bytes;
 			bi->bi_blocks = bytes * GFS2_NBBY;
 		/* last block */
 		} else if (x + 1 == length) {
 			bytes = bytes_left;
 			bi->bi_offset = sizeof(struct gfs2_meta_header);
 			bi->bi_start = rgd->rd_bitbytes - bytes_left;
-			bi->bi_len = bytes;
+			bi->bi_bytes = bytes;
 			bi->bi_blocks = bytes * GFS2_NBBY;
 		/* other blocks */
 		} else {
@@ -802,7 +802,7 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd)
 				sizeof(struct gfs2_meta_header);
 			bi->bi_offset = sizeof(struct gfs2_meta_header);
 			bi->bi_start = rgd->rd_bitbytes - bytes_left;
-			bi->bi_len = bytes;
+			bi->bi_bytes = bytes;
 			bi->bi_blocks = bytes * GFS2_NBBY;
 		}
 
@@ -814,11 +814,11 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd)
 		return -EIO;
 	}
 	bi = rgd->rd_bits + (length - 1);
-	if ((bi->bi_start + bi->bi_len) * GFS2_NBBY != rgd->rd_data) {
+	if ((bi->bi_start + bi->bi_bytes) * GFS2_NBBY != rgd->rd_data) {
 		if (gfs2_consist_rgrpd(rgd)) {
 			gfs2_rindex_print(rgd);
 			fs_err(sdp, "start=%u len=%u offset=%u\n",
-			       bi->bi_start, bi->bi_len, bi->bi_offset);
+			       bi->bi_start, bi->bi_bytes, bi->bi_offset);
 		}
 		return -EIO;
 	}
@@ -1145,8 +1145,8 @@ static u32 count_unlinked(struct gfs2_rgrpd *rgd)
 		goal = 0;
 		buffer = bi->bi_bh->b_data + bi->bi_offset;
 		WARN_ON(!buffer_uptodate(bi->bi_bh));
-		while (goal < bi->bi_len * GFS2_NBBY) {
-			goal = gfs2_bitfit(buffer, bi->bi_len, goal,
+		while (goal < bi->bi_blocks) {
+			goal = gfs2_bitfit(buffer, bi->bi_bytes, goal,
 					   GFS2_BLKST_UNLINKED);
 			if (goal == BFITNOENT)
 				break;
@@ -1318,7 +1318,7 @@ int gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset,
 	u32 trimmed = 0;
 	u8 diff;
 
-	for (x = 0; x < bi->bi_len; x++) {
+	for (x = 0; x < bi->bi_bytes; x++) {
 		const u8 *clone = bi->bi_clone ? bi->bi_clone : bi->bi_bh->b_data;
 		clone += bi->bi_offset;
 		clone += x;
@@ -1751,7 +1751,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 		if (state != GFS2_BLKST_UNLINKED && bi->bi_clone)
 			buffer = bi->bi_clone + bi->bi_offset;
 		initial_offset = rbm->offset;
-		offset = gfs2_bitfit(buffer, bi->bi_len, rbm->offset, state);
+		offset = gfs2_bitfit(buffer, bi->bi_bytes, rbm->offset, state);
 		if (offset == BFITNOENT)
 			goto bitmap_full;
 		rbm->offset = offset;
@@ -2234,7 +2234,7 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
 						      GFP_NOFS | __GFP_NOFAIL);
 				memcpy(bi->bi_clone + bi->bi_offset,
 				       bi->bi_bh->b_data + bi->bi_offset,
-				       bi->bi_len);
+				       bi->bi_bytes);
 			}
 			gfs2_trans_add_meta(rbm.rgd->rd_gl, bi->bi_bh);
 			bi_prev = bi;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 05/11] gfs2: Fix some minor typos
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 04/11] gfs2: Rename bitmap.bi_{len => bytes} Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/quota.c | 2 +-
 fs/gfs2/rgrp.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 0efae7a0ee80..2ae5a109eea7 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1183,7 +1183,7 @@ static int print_message(struct gfs2_quota_data *qd, char *type)
  *
  * Returns: 0 on success.
  *                  min_req = ap->min_target ? ap->min_target : ap->target;
- *                  quota must allow atleast min_req blks for success and
+ *                  quota must allow at least min_req blks for success and
  *                  ap->allowed is set to the number of blocks allowed
  *
  *          -EDQUOT otherwise, quota violation. ap->allowed is set to number
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index aa96fd32eaf1..070ad493a4ec 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2022,7 +2022,7 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
  * We try our best to find an rgrp that has at least ap->target blocks
  * available. After a couple of passes (loops == 2), the prospects of finding
  * such an rgrp diminish. At this stage, we return the first rgrp that has
- * atleast ap->min_target blocks available. Either way, we set ap->allowed to
+ * at least ap->min_target blocks available. Either way, we set ap->allowed to
  * the number of blocks available in the chosen rgrp.
  *
  * Returns: 0 on success,
@@ -2091,7 +2091,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			}
 		}
 
-		/* Skip unuseable resource groups */
+		/* Skip unusable resource groups */
 		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
 		    (loops == 0 && ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 05/11] gfs2: Fix some minor typos Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:39   ` Steven Whitehouse
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full Andreas Gruenbacher
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
bit position within a resource group.  This representation is used in
the bitmap manipulation code to prevent excessive conversions between
block numbers and bit positions, but also in struct gfs2_blkreserv which
is part of struct gfs2_inode, to mark the start of a reservation.  In
the inode, the bit position representation makes less sense: first, the
start position is used as a block number about as often as a bit
position; second, the bit position representation makes the code
unnecessarily difficult to read.

Therefore, change struct gfs2_blkreserv to represent the start of a
reservation as a block number instead of a bit position.  (This requires
keeping track of the resource group in gfs2_blkreserv separately.) With
that change, various things can be slightly simplified, and struct
gfs2_rbm can be moved to rgrp.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c       |   2 +-
 fs/gfs2/incore.h     |  30 +--------
 fs/gfs2/rgrp.c       | 156 ++++++++++++++++++++++++++-----------------
 fs/gfs2/trace_gfs2.h |  10 +--
 fs/gfs2/trans.h      |   2 +-
 5 files changed, 103 insertions(+), 97 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 03128ed1f34e..c192906bb5f6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1503,7 +1503,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 
 			/* Must be done with the rgrp glock held: */
 			if (gfs2_rs_active(&ip->i_res) &&
-			    rgd == ip->i_res.rs_rbm.rgd)
+			    rgd == ip->i_res.rs_rgd)
 				gfs2_rs_deltree(&ip->i_res);
 		}
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a1771d8a93be..0ed28fbc73b4 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -124,31 +124,6 @@ struct gfs2_rgrpd {
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
-struct gfs2_rbm {
-	struct gfs2_rgrpd *rgd;
-	u32 offset;		/* The offset is bitmap relative */
-	int bii;		/* Bitmap index */
-};
-
-static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
-{
-	return rbm->rgd->rd_bits + rbm->bii;
-}
-
-static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
-{
-	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
-	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
-		rbm->offset;
-}
-
-static inline bool gfs2_rbm_eq(const struct gfs2_rbm *rbm1,
-			       const struct gfs2_rbm *rbm2)
-{
-	return (rbm1->rgd == rbm2->rgd) && (rbm1->bii == rbm2->bii) &&
-	       (rbm1->offset == rbm2->offset);
-}
-
 enum gfs2_state_bits {
 	BH_Pinned = BH_PrivateStart,
 	BH_Escaped = BH_PrivateStart + 1,
@@ -309,8 +284,9 @@ struct gfs2_qadata { /* quota allocation data */
 */
 
 struct gfs2_blkreserv {
-	struct rb_node rs_node;       /* link to other block reservations */
-	struct gfs2_rbm rs_rbm;       /* Start of reservation */
+	struct rb_node rs_node;       /* node within rd_rstree */
+	struct gfs2_rgrpd *rs_rgd;
+	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
 };
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 070ad493a4ec..ee6ea7d8cf44 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -49,6 +49,24 @@
 #define LBITSKIP00 (0x0000000000000000UL)
 #endif
 
+struct gfs2_rbm {
+	struct gfs2_rgrpd *rgd;
+	u32 offset;		/* The offset is bitmap relative */
+	int bii;		/* Bitmap index */
+};
+
+static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
+{
+	return rbm->rgd->rd_bits + rbm->bii;
+}
+
+static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
+{
+	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
+	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
+		rbm->offset;
+}
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -184,7 +202,7 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
 
 /**
  * rs_cmp - multi-block reservation range compare
- * @blk: absolute file system block number of the new reservation
+ * @start: start of the new reservation
  * @len: number of blocks in the new reservation
  * @rs: existing reservation to compare against
  *
@@ -192,13 +210,11 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
  *         -1 if the block range is before the start of the reservation
  *          0 if the block range overlaps with the reservation
  */
-static inline int rs_cmp(u64 blk, u32 len, struct gfs2_blkreserv *rs)
+static inline int rs_cmp(u64 start, u32 len, struct gfs2_blkreserv *rs)
 {
-	u64 startblk = gfs2_rbm_to_block(&rs->rs_rbm);
-
-	if (blk >= startblk + rs->rs_free)
+	if (start >= rs->rs_start + rs->rs_free)
 		return 1;
-	if (blk + len - 1 < startblk)
+	if (rs->rs_start >= start + len)
 		return -1;
 	return 0;
 }
@@ -269,12 +285,11 @@ static u32 gfs2_bitfit(const u8 *buf, const unsigned int len,
 
 static int gfs2_rbm_from_block(struct gfs2_rbm *rbm, u64 block)
 {
-	u64 rblock = block - rbm->rgd->rd_data0;
+	u64 rblock;
 
-	if (WARN_ON_ONCE(rblock > UINT_MAX))
-		return -EINVAL;
-	if (block >= rbm->rgd->rd_data0 + rbm->rgd->rd_data)
+	if (!rgrp_contains_block(rbm->rgd, block))
 		return -E2BIG;
+	rblock = block - rbm->rgd->rd_data0;
 
 	rbm->bii = 0;
 	rbm->offset = (u32)(rblock);
@@ -316,13 +331,28 @@ static bool gfs2_rbm_incr(struct gfs2_rbm *rbm)
 	return false;
 }
 
+static struct gfs2_bitmap *gfs2_block_to_bitmap(struct gfs2_rgrpd *rgd,
+						u64 block)
+{
+	unsigned int delta = (sizeof(struct gfs2_rgrp) -
+			      sizeof(struct gfs2_meta_header)) * GFS2_NBBY;
+	unsigned int rblock, bii;
+
+	if (!rgrp_contains_block(rgd, block))
+		return NULL;
+	rblock = block - rgd->rd_data0;
+	bii = (rblock + delta) / rgd->rd_sbd->sd_blocks_per_bitmap;
+	return rgd->rd_bits + bii;
+}
+
 /**
  * gfs2_unaligned_extlen - Look for free blocks which are not byte aligned
  * @rbm: Position to search (value/result)
  * @n_unaligned: Number of unaligned blocks to check
  * @len: Decremented for each block found (terminate on zero)
  *
- * Returns: true if a non-free block is encountered
+ * Returns: true if a non-free block is encountered or the end of the resource
+ *	    group is reached.
  */
 
 static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *len)
@@ -618,10 +648,10 @@ static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs)
 {
 	struct gfs2_inode *ip = container_of(rs, struct gfs2_inode, i_res);
 
-	gfs2_print_dbg(seq, "  B: n:%llu s:%llu b:%u f:%u\n",
+	gfs2_print_dbg(seq, "  B: n:%llu s:%llu f:%u\n",
 		       (unsigned long long)ip->i_no_addr,
-		       (unsigned long long)gfs2_rbm_to_block(&rs->rs_rbm),
-		       rs->rs_rbm.offset, rs->rs_free);
+		       (unsigned long long)rs->rs_start,
+		       rs->rs_free);
 }
 
 /**
@@ -636,24 +666,26 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	if (!gfs2_rs_active(rs))
 		return;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	trace_gfs2_rs(rs, TRACE_RS_TREEDEL);
 	rb_erase(&rs->rs_node, &rgd->rd_rstree);
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		struct gfs2_bitmap *bi = rbm_bi(&rs->rs_rbm);
+		struct gfs2_bitmap *bi;
 
 		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free);
-		rs->rs_rbm.rgd->rd_reserved -= rs->rs_free;
+		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
+		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
 		rgd->rd_extfail_pt += rs->rs_free;
 		rs->rs_free = 0;
-		clear_bit(GBF_FULL, &bi->bi_flags);
+		bi = gfs2_block_to_bitmap(rgd, rs->rs_start);
+		if (bi)
+			clear_bit(GBF_FULL, &bi->bi_flags);
 	}
 }
 
@@ -666,7 +698,7 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
 {
 	struct gfs2_rgrpd *rgd;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	if (rgd) {
 		spin_lock(&rgd->rd_rsspin);
 		__rs_deltree(rs);
@@ -1481,8 +1513,7 @@ static void rs_insert(struct gfs2_inode *ip)
 	struct rb_node **newn, *parent = NULL;
 	int rc;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
-	u64 fsblock = gfs2_rbm_to_block(&rs->rs_rbm);
+	struct gfs2_rgrpd *rgd = rs->rs_rgd;
 
 	BUG_ON(gfs2_rs_active(rs));
 
@@ -1493,7 +1524,7 @@ static void rs_insert(struct gfs2_inode *ip)
 			rb_entry(*newn, struct gfs2_blkreserv, rs_node);
 
 		parent = *newn;
-		rc = rs_cmp(fsblock, rs->rs_free, cur);
+		rc = rs_cmp(rs->rs_start, rs->rs_free, cur);
 		if (rc > 0)
 			newn = &((*newn)->rb_right);
 		else if (rc < 0)
@@ -1581,7 +1612,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 
 	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true);
 	if (ret == 0) {
-		rs->rs_rbm = rbm;
+		rs->rs_start = gfs2_rbm_to_block(&rbm);
 		rs->rs_free = extlen;
 		rs_insert(ip);
 	} else {
@@ -1626,7 +1657,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 
 	if (n) {
 		while ((rs_cmp(block, length, rs) == 0) && (&ip->i_res != rs)) {
-			block = gfs2_rbm_to_block(&rs->rs_rbm) + rs->rs_free;
+			block = rs->rs_start + rs->rs_free;
 			n = n->rb_right;
 			if (n == NULL)
 				break;
@@ -1967,7 +1998,7 @@ static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
 	u64 tdiff;
 
 	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
-                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
+                            rs->rs_rgd->rd_gl->gl_dstamp));
 
 	return tdiff > (msecs * 1000 * 1000);
 }
@@ -2045,45 +2076,45 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	if (gfs2_assert_warn(sdp, ap->target))
 		return -EINVAL;
 	if (gfs2_rs_active(rs)) {
-		begin = rs->rs_rbm.rgd;
-	} else if (rs->rs_rbm.rgd &&
-		   rgrp_contains_block(rs->rs_rbm.rgd, ip->i_goal)) {
-		begin = rs->rs_rbm.rgd;
+		begin = rs->rs_rgd;
+	} else if (rs->rs_rgd &&
+		   rgrp_contains_block(rs->rs_rgd, ip->i_goal)) {
+		begin = rs->rs_rgd;
 	} else {
 		check_and_update_goal(ip);
-		rs->rs_rbm.rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
+		rs->rs_rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
 	}
 	if (S_ISDIR(ip->i_inode.i_mode) && (ap->aflags & GFS2_AF_ORLOV))
 		skip = gfs2_orlov_skip(ip);
-	if (rs->rs_rbm.rgd == NULL)
+	if (rs->rs_rgd == NULL)
 		return -EBADSLT;
 
 	while (loops < 3) {
 		rg_locked = 1;
 
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rbm.rgd->rd_gl)) {
+		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
 				if (loops == 0 &&
-				    !fast_to_acquire(rs->rs_rbm.rgd))
+				    !fast_to_acquire(rs->rs_rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
 				    gfs2_rgrp_used_recently(rs, 1000) &&
-				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+				    gfs2_rgrp_congested(rs->rs_rgd, loops))
 					goto next_rgrp;
 			}
-			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
+			error = gfs2_glock_nq_init(rs->rs_rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
-			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
-				error = update_rgrp_lvb(rs->rs_rbm.rgd);
+				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
@@ -2092,24 +2123,24 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 
 		/* Skip unusable resource groups */
-		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
+		if ((rs->rs_rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
-		    (loops == 0 && ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
+		    (loops == 0 && ap->target > rs->rs_rgd->rd_extfail_pt))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
-			gfs2_rgrp_bh_get(rs->rs_rbm.rgd);
+			gfs2_rgrp_bh_get(rs->rs_rgd);
 
 		/* Get a reservation if we don't already have one */
 		if (!gfs2_rs_active(rs))
-			rg_mblk_search(rs->rs_rbm.rgd, ip, ap);
+			rg_mblk_search(rs->rs_rgd, ip, ap);
 
 		/* Skip rgrps when we can't get a reservation on first pass */
 		if (!gfs2_rs_active(rs) && (loops < 1))
 			goto check_rgrp;
 
 		/* If rgrp has enough free space, use it */
-		free_blocks = rgd_free(rs->rs_rbm.rgd, rs);
+		free_blocks = rgd_free(rs->rs_rgd, rs);
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
@@ -2118,8 +2149,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
-		if (rs->rs_rbm.rgd->rd_flags & GFS2_RDF_CHECK)
-			try_rgrp_unlink(rs->rs_rbm.rgd, &last_unlinked,
+		if (rs->rs_rgd->rd_flags & GFS2_RDF_CHECK)
+			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
 		/* Drop reservation, if we couldn't use reserved rgrp */
@@ -2131,7 +2162,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 next_rgrp:
 		/* Find the next rgrp, and continue looking */
-		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
+		if (gfs2_select_rgrp(&rs->rs_rgd, begin))
 			continue;
 		if (skip)
 			continue;
@@ -2307,20 +2338,21 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 {
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
-	unsigned rlen;
-	u64 block;
-	int ret;
 
 	spin_lock(&rgd->rd_rsspin);
 	if (gfs2_rs_active(rs)) {
-		if (gfs2_rbm_eq(&rs->rs_rbm, rbm)) {
-			block = gfs2_rbm_to_block(rbm);
-			ret = gfs2_rbm_from_block(&rs->rs_rbm, block + len);
+		u64 start = gfs2_rbm_to_block(rbm);
+
+		if (rs->rs_start == start) {
+			unsigned int rlen;
+
+			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
 			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
-			if (rs->rs_free && !ret)
+			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
+			    rs->rs_free)
 				goto out;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
@@ -2349,15 +2381,13 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
 	u64 goal;
 
 	if (gfs2_rs_active(&ip->i_res)) {
-		*rbm = ip->i_res.rs_rbm;
-		return;
+		goal = ip->i_res.rs_start;
+	} else {
+		if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
+			goal = ip->i_goal;
+		else
+			goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
 	}
-
-	if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
-		goal = ip->i_goal;
-	else
-		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
-
 	BUG_ON(gfs2_rbm_from_block(rbm, goal));
 }
 
@@ -2377,7 +2407,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct buffer_head *dibh;
-	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rbm.rgd, };
+	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
 	unsigned int ndata;
 	u64 block; /* block, within the file system scope */
 	int error;
@@ -2612,7 +2642,7 @@ void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 			return;
 		rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	} else {
-		rgd = ip->i_res.rs_rbm.rgd;
+		rgd = ip->i_res.rs_rgd;
 		if (!rgd || !rgrp_contains_block(rgd, block))
 			rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	}
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index e0025258107a..7586c7629497 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -602,13 +602,13 @@ TRACE_EVENT(gfs2_rs,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= rs->rs_rbm.rgd->rd_sbd->sd_vfs->s_dev;
-		__entry->rd_addr	= rs->rs_rbm.rgd->rd_addr;
-		__entry->rd_free_clone	= rs->rs_rbm.rgd->rd_free_clone;
-		__entry->rd_reserved	= rs->rs_rbm.rgd->rd_reserved;
+		__entry->dev		= rs->rs_rgd->rd_sbd->sd_vfs->s_dev;
+		__entry->rd_addr	= rs->rs_rgd->rd_addr;
+		__entry->rd_free_clone	= rs->rs_rgd->rd_free_clone;
+		__entry->rd_reserved	= rs->rs_rgd->rd_reserved;
 		__entry->inum		= container_of(rs, struct gfs2_inode,
 						       i_res)->i_no_addr;
-		__entry->start		= gfs2_rbm_to_block(&rs->rs_rbm);
+		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
 		__entry->func		= func;
 	),
diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h
index ad70087d0597..a27078f98e7a 100644
--- a/fs/gfs2/trans.h
+++ b/fs/gfs2/trans.h
@@ -30,7 +30,7 @@ struct gfs2_glock;
  * block, or all of the blocks in the rg, whichever is smaller */
 static inline unsigned int gfs2_rg_blocks(const struct gfs2_inode *ip, unsigned requested)
 {
-	struct gfs2_rgrpd *rgd = ip->i_res.rs_rbm.rgd;
+	struct gfs2_rgrpd *rgd = ip->i_res.rs_rgd;
 
 	if (requested < rgd->rd_length)
 		return requested + 1;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (5 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:23   ` Steven Whitehouse
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 08/11] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Reservations in gfs can span multiple gfs2_bitmaps (but they won't span
multiple resource groups).  When removing a reservation, we want to
clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of
the first bitmap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ee6ea7d8cf44..ee981085db33 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -672,7 +672,7 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		struct gfs2_bitmap *bi;
+		struct gfs2_bitmap *start, *last;
 
 		/* return reserved blocks to the rgrp */
 		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
@@ -682,10 +682,15 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
 		rgd->rd_extfail_pt += rs->rs_free;
+		start = gfs2_block_to_bitmap(rgd, rs->rs_start);
+		last = gfs2_block_to_bitmap(rgd,
+				rs->rs_start + rs->rs_free - 1);
 		rs->rs_free = 0;
-		bi = gfs2_block_to_bitmap(rgd, rs->rs_start);
-		if (bi)
-			clear_bit(GBF_FULL, &bi->bi_flags);
+		if (!start || !last)
+			return;
+		do
+			clear_bit(GBF_FULL, &start->bi_flags);
+		while (start++ != last);
 	}
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 08/11] gfs2: Add per-reservation reserved block accounting
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (6 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 09/11] gfs2: Remove unnecessary gfs2_rlist_alloc parameter Andreas Gruenbacher
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the
number of blocks reserved by this particular reservation.  When making a
reservation with gfs2_inplace_reserve, this field is set to somewhere
between ap->min_target and ap->target depending on the number of free
blocks in the resource group.  When allocating blocks with
gfs2_alloc_blocks, rs_reserved is decremented accordingly.  Eventually,
any reserved but not consumed blocks are returned to the resource group
by gfs2_inplace_release (via gfs2_adjust_reservation).

The reservation tree (rd_rstree) is unaffected by this change: the
reservations it tracks are still advisory, and the sizes of those
reservations (rs_free) are still determined by the tentative allocation
sizes (i_sizehint).  Since rd_reserved now tracks the number of reserved
blocks rather than the number of tentatively rd_reserved blocks, we may
end up with slightly different allocation patterns, though. The
rd_extfail_pt optimization will still cause ill-suited resource groups
to be skipped quickly.

We expect to augment this with a patch that will reserve an extent of
blocks rather than just reserving a number of blocks in
gfs2_inplace_reserve.  gfs2_alloc_blocks will then be able to consume
that reserved extent before scanning for additional available blocks;
this should eliminate double bitmap scanning in most cases.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c       |  4 +--
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/rgrp.c       | 66 +++++++++++++++++++++++++-------------------
 fs/gfs2/trace_gfs2.h |  8 ++++--
 4 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index e8864ff2ed03..12c19e3fcb1b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1008,8 +1008,8 @@ static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t
 			goto out_qunlock;
 
 		/* check if the selected rgrp limits our max_blks further */
-		if (ap.allowed && ap.allowed < max_blks)
-			max_blks = ap.allowed;
+		if (ip->i_res.rs_reserved < max_blks)
+			max_blks = ip->i_res.rs_reserved;
 
 		/* Almost done. Calculate bytes that can be written using
 		 * max_blks. We also recompute max_bytes, data_blocks and
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0ed28fbc73b4..932e63924f7e 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -288,6 +288,7 @@ struct gfs2_blkreserv {
 	struct gfs2_rgrpd *rs_rgd;
 	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
+	u32 rs_reserved;              /* number of reserved blocks */
 };
 
 /*
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ee981085db33..ef6768bcff21 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -674,9 +674,6 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	if (rs->rs_free) {
 		struct gfs2_bitmap *start, *last;
 
-		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
-		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
@@ -1543,39 +1540,27 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-
-	/* Do our rgrp accounting for the reservation */
-	rgd->rd_reserved += rs->rs_free; /* blocks reserved */
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
 
 /**
- * rgd_free - return the number of free blocks we can allocate.
+ * rgd_free - compute the number of blocks we can allocate
  * @rgd: the resource group
  *
- * This function returns the number of free blocks for an rgrp.
- * That's the clone-free blocks (blocks that are free, not including those
- * still being used for unlinked files that haven't been deleted.)
- *
- * It also subtracts any blocks reserved by someone else, but does not
- * include free blocks that are still part of our current reservation,
- * because obviously we can (and will) allocate them.
+ * Compute the number of blocks we can allocate in @rgd.  That's the clone-free
+ * blocks (blocks that are free, not including those still being used for
+ * unlinked files that haven't been deleted) minus the blocks currently
+ * reserved by any reservations other than @rs.
  */
 static inline u32 rgd_free(struct gfs2_rgrpd *rgd, struct gfs2_blkreserv *rs)
 {
-	u32 tot_reserved, tot_free;
-
-	if (WARN_ON_ONCE(rgd->rd_reserved < rs->rs_free))
-		return 0;
-	tot_reserved = rgd->rd_reserved - rs->rs_free;
+	u32 free;
 
-	if (rgd->rd_free_clone < tot_reserved)
-		tot_reserved = 0;
-
-	tot_free = rgd->rd_free_clone - tot_reserved;
-
-	return tot_free;
+	free = rgd->rd_free_clone - rgd->rd_reserved;
+	if (rgd == rs->rs_rgd)
+		free += rs->rs_reserved;
+	return free;
 }
 
 /**
@@ -2058,8 +2043,7 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
  * We try our best to find an rgrp that has at least ap->target blocks
  * available. After a couple of passes (loops == 2), the prospects of finding
  * such an rgrp diminish. At this stage, we return the first rgrp that has
- * at least ap->min_target blocks available. Either way, we set ap->allowed to
- * the number of blocks available in the chosen rgrp.
+ * at least ap->min_target blocks available.
  *
  * Returns: 0 on success,
  *          -ENOMEM if a suitable rgrp can't be found
@@ -2076,6 +2060,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	int loops = 0;
 	u32 free_blocks, skip = 0;
 
+	BUG_ON(rs->rs_reserved);
+
 	if (sdp->sd_args.ar_rgrplvb)
 		flags |= GL_SKIP;
 	if (gfs2_assert_warn(sdp, ap->target))
@@ -2149,7 +2135,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
-			ap->allowed = free_blocks;
+			struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+			rs->rs_reserved = ap->target;
+			if (rs->rs_reserved > free_blocks)
+				rs->rs_reserved = free_blocks;
+			spin_lock(&rs->rs_rgd->rd_rsspin);
+			rgd->rd_reserved += rs->rs_reserved;
+			spin_unlock(&rs->rs_rgd->rd_rsspin);
 			return 0;
 		}
 check_rgrp:
@@ -2201,6 +2194,17 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 void gfs2_inplace_release(struct gfs2_inode *ip)
 {
+	struct gfs2_blkreserv *rs = &ip->i_res;
+
+	if (rs->rs_reserved) {
+		struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+		spin_lock(&rgd->rd_rsspin);
+		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
+		rgd->rd_reserved -= rs->rs_reserved;
+		spin_unlock(&rs->rs_rgd->rd_rsspin);
+		rs->rs_reserved = 0;
+	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
 		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 }
@@ -2345,6 +2349,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 	struct gfs2_rgrpd *rgd = rbm->rgd;
 
 	spin_lock(&rgd->rd_rsspin);
+	BUG_ON(rs->rs_reserved < len);
+	rgd->rd_reserved -= len;
+	rs->rs_reserved -= len;
 	if (gfs2_rs_active(rs)) {
 		u64 start = gfs2_rbm_to_block(rbm);
 
@@ -2354,7 +2361,6 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
-			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
@@ -2417,6 +2423,8 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	u64 block; /* block, within the file system scope */
 	int error;
 
+	BUG_ON(ip->i_res.rs_reserved < *nblocks);
+
 	gfs2_set_alloc_start(&rbm, ip, dinode);
 	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
 
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 7586c7629497..282fcb1a242f 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -598,6 +598,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
 		__field(	u32,	free			)
+		__field(	u32,	reserved		)
 		__field(	u8,	func			)
 	),
 
@@ -610,17 +611,20 @@ TRACE_EVENT(gfs2_rs,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
+		__entry->reserved	= rs->rs_reserved;
 		__entry->func		= func;
 	),
 
-	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu",
+	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu r:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long long)__entry->rd_addr,
 		  (unsigned long)__entry->rd_free_clone,
 		  (unsigned long)__entry->rd_reserved,
-		  rs_func_name(__entry->func), (unsigned long)__entry->free)
+		  rs_func_name(__entry->func),
+		  (unsigned long)__entry->free,
+		  (unsigned long)__entry->reserved)
 );
 
 #endif /* _TRACE_GFS2_H */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 09/11] gfs2: Remove unnecessary gfs2_rlist_alloc parameter
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (7 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 08/11] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free Andreas Gruenbacher
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking Andreas Gruenbacher
  10 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

The state parameter of gfs2_rlist_alloc is set to LM_ST_EXCLUSIVE in all
calls, so remove it and hardcode that state in gfs2_rlist_alloc instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/dir.c   | 2 +-
 fs/gfs2/rgrp.c  | 5 ++---
 fs/gfs2/rgrp.h  | 2 +-
 fs/gfs2/xattr.c | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index e37002560c11..89c601e5e52f 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2018,7 +2018,7 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 		l_blocks++;
 	}
 
-	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
+	gfs2_rlist_alloc(&rlist);
 
 	for (x = 0; x < rlist.rl_rgrps; x++) {
 		struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ef6768bcff21..76a0a8073c11 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2697,13 +2697,12 @@ void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
  * gfs2_rlist_alloc - all RGs have been added to the rlist, now allocate
  *      and initialize an array of glock holders for them
  * @rlist: the list of resource groups
- * @state: the lock state to acquire the RG lock in
  *
  * FIXME: Don't use NOFAIL
  *
  */
 
-void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state)
+void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist)
 {
 	unsigned int x;
 
@@ -2712,7 +2711,7 @@ void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state)
 				      GFP_NOFS | __GFP_NOFAIL);
 	for (x = 0; x < rlist->rl_rgrps; x++)
 		gfs2_holder_init(rlist->rl_rgd[x]->rd_gl,
-				state, 0,
+				LM_ST_EXCLUSIVE, 0,
 				&rlist->rl_ghs[x]);
 }
 
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 6bb5ee112324..09519ae10fb6 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -67,7 +67,7 @@ struct gfs2_rgrp_list {
 
 extern void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 			   u64 block);
-extern void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state);
+extern void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist);
 extern void gfs2_rlist_free(struct gfs2_rgrp_list *rlist);
 extern u64 gfs2_ri_total(struct gfs2_sbd *sdp);
 extern void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl);
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 38515988aaf7..e11f77f080a0 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -1299,7 +1299,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 	else
 		goto out;
 
-	gfs2_rlist_alloc(&rlist, LM_ST_EXCLUSIVE);
+	gfs2_rlist_alloc(&rlist);
 
 	for (x = 0; x < rlist.rl_rgrps; x++) {
 		struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (8 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 09/11] gfs2: Remove unnecessary gfs2_rlist_alloc parameter Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:28   ` Steven Whitehouse
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking Andreas Gruenbacher
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function rgblk_free can only deal with one resource group at a time, so
pass that resource group is as a parameter.  Several of the callers
already have the resource group at hand, so we only need additional
lookup code in a few places.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c  |  4 ++--
 fs/gfs2/dir.c   |  5 ++++-
 fs/gfs2/rgrp.c  | 42 +++++++++++++++---------------------------
 fs/gfs2/rgrp.h  |  6 ++++--
 fs/gfs2/xattr.c | 16 +++++++++-------
 5 files changed, 34 insertions(+), 39 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index c192906bb5f6..55e8ad1a6e13 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1566,7 +1566,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 			continue;
 		}
 		if (bstart) {
-			__gfs2_free_blocks(ip, bstart, (u32)blen, meta);
+			__gfs2_free_blocks(ip, rgd, bstart, (u32)blen, meta);
 			(*btotal) += blen;
 			gfs2_add_inode_blocks(&ip->i_inode, -blen);
 		}
@@ -1574,7 +1574,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 		blen = 1;
 	}
 	if (bstart) {
-		__gfs2_free_blocks(ip, bstart, (u32)blen, meta);
+		__gfs2_free_blocks(ip, rgd, bstart, (u32)blen, meta);
 		(*btotal) += blen;
 		gfs2_add_inode_blocks(&ip->i_inode, -blen);
 	}
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 89c601e5e52f..f9c6c7ee89e1 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2039,6 +2039,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 	bh = leaf_bh;
 
 	for (blk = leaf_no; blk; blk = nblk) {
+		struct gfs2_rgrpd *rgd;
+
 		if (blk != leaf_no) {
 			error = get_leaf(dip, blk, &bh);
 			if (error)
@@ -2049,7 +2051,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 		if (blk != leaf_no)
 			brelse(bh);
 
-		gfs2_free_meta(dip, blk, 1);
+		rgd = gfs2_blk2rgrpd(sdp, blk, true);
+		gfs2_free_meta(dip, rgd, blk, 1);
 		gfs2_add_inode_blocks(&dip->i_inode, -1);
 	}
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 76a0a8073c11..8a6b41f3667c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2245,26 +2245,19 @@ static void gfs2_alloc_extent(const struct gfs2_rbm *rbm, bool dinode,
 /**
  * rgblk_free - Change alloc state of given block(s)
  * @sdp: the filesystem
+ * @rgd: the resource group the blocks are in
  * @bstart: the start of a run of blocks to free
  * @blen: the length of the block run (all must lie within ONE RG!)
  * @new_state: GFS2_BLKST_XXX the after-allocation block state
- *
- * Returns:  Resource group containing the block(s)
  */
 
-static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
-				     u32 blen, unsigned char new_state)
+static void rgblk_free(struct gfs2_sbd *sdp, struct gfs2_rgrpd *rgd,
+		       u64 bstart, u32 blen, unsigned char new_state)
 {
 	struct gfs2_rbm rbm;
 	struct gfs2_bitmap *bi, *bi_prev = NULL;
 
-	rbm.rgd = gfs2_blk2rgrpd(sdp, bstart, 1);
-	if (!rbm.rgd) {
-		if (gfs2_consist(sdp))
-			fs_err(sdp, "block = %llu\n", (unsigned long long)bstart);
-		return NULL;
-	}
-
+	rbm.rgd = rgd;
 	BUG_ON(gfs2_rbm_from_block(&rbm, bstart));
 	while (blen--) {
 		bi = rbm_bi(&rbm);
@@ -2282,8 +2275,6 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
 		gfs2_setbit(&rbm, false, new_state);
 		gfs2_rbm_incr(&rbm);
 	}
-
-	return rbm.rgd;
 }
 
 /**
@@ -2499,20 +2490,19 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 /**
  * __gfs2_free_blocks - free a contiguous run of block(s)
  * @ip: the inode these blocks are being freed from
+ * @rgd: the resource group the blocks are in
  * @bstart: first block of a run of contiguous blocks
  * @blen: the length of the block run
  * @meta: 1 if the blocks represent metadata
  *
  */
 
-void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta)
+void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
+			u64 bstart, u32 blen, int meta)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
-	struct gfs2_rgrpd *rgd;
 
-	rgd = rgblk_free(sdp, bstart, blen, GFS2_BLKST_FREE);
-	if (!rgd)
-		return;
+	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
 	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
 	rgd->rd_free += blen;
 	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
@@ -2527,16 +2517,18 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta)
 /**
  * gfs2_free_meta - free a contiguous run of data block(s)
  * @ip: the inode these blocks are being freed from
+ * @rgd: the resource group the blocks are in
  * @bstart: first block of a run of contiguous blocks
  * @blen: the length of the block run
  *
  */
 
-void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen)
+void gfs2_free_meta(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
+		    u64 bstart, u32 blen)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 
-	__gfs2_free_blocks(ip, bstart, blen, 1);
+	__gfs2_free_blocks(ip, rgd, bstart, blen, 1);
 	gfs2_statfs_change(sdp, 0, +blen, 0);
 	gfs2_quota_change(ip, -(s64)blen, ip->i_inode.i_uid, ip->i_inode.i_gid);
 }
@@ -2548,9 +2540,10 @@ void gfs2_unlink_di(struct inode *inode)
 	struct gfs2_rgrpd *rgd;
 	u64 blkno = ip->i_no_addr;
 
-	rgd = rgblk_free(sdp, blkno, 1, GFS2_BLKST_UNLINKED);
+	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
 	if (!rgd)
 		return;
+	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
@@ -2560,13 +2553,8 @@ void gfs2_unlink_di(struct inode *inode)
 void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
-	struct gfs2_rgrpd *tmp_rgd;
-
-	tmp_rgd = rgblk_free(sdp, ip->i_no_addr, 1, GFS2_BLKST_FREE);
-	if (!tmp_rgd)
-		return;
-	gfs2_assert_withdraw(sdp, rgd == tmp_rgd);
 
+	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
 	if (!rgd->rd_dinodes)
 		gfs2_consist_rgrpd(rgd);
 	rgd->rd_dinodes--;
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 09519ae10fb6..b596c3d17988 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -51,8 +51,10 @@ extern int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *n,
 extern int gfs2_rsqa_alloc(struct gfs2_inode *ip);
 extern void gfs2_rs_deltree(struct gfs2_blkreserv *rs);
 extern void gfs2_rsqa_delete(struct gfs2_inode *ip, atomic_t *wcount);
-extern void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta);
-extern void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen);
+extern void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
+			       u64 bstart, u32 blen, int meta);
+extern void gfs2_free_meta(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
+			   u64 bstart, u32 blen);
 extern void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip);
 extern void gfs2_unlink_di(struct inode *inode);
 extern int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr,
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index e11f77f080a0..996c915a9c97 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -283,7 +283,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
 			blen++;
 		else {
 			if (bstart)
-				gfs2_free_meta(ip, bstart, blen);
+				gfs2_free_meta(ip, rgd, bstart, blen);
 			bstart = bn;
 			blen = 1;
 		}
@@ -292,7 +292,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
 		gfs2_add_inode_blocks(&ip->i_inode, -1);
 	}
 	if (bstart)
-		gfs2_free_meta(ip, bstart, blen);
+		gfs2_free_meta(ip, rgd, bstart, blen);
 
 	if (prev && !leave) {
 		u32 len;
@@ -1250,6 +1250,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrp_list rlist;
+	struct gfs2_rgrpd *rgd;
 	struct buffer_head *indbh, *dibh;
 	__be64 *eablk, *end;
 	unsigned int rg_blocks = 0;
@@ -1302,8 +1303,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 	gfs2_rlist_alloc(&rlist);
 
 	for (x = 0; x < rlist.rl_rgrps; x++) {
-		struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
-
+		rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
 		rg_blocks += rgd->rd_length;
 	}
 
@@ -1320,6 +1320,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 
 	eablk = (__be64 *)(indbh->b_data + sizeof(struct gfs2_meta_header));
 	bstart = 0;
+	rgd = NULL;
 	blen = 0;
 
 	for (; eablk < end; eablk++) {
@@ -1333,8 +1334,9 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 			blen++;
 		else {
 			if (bstart)
-				gfs2_free_meta(ip, bstart, blen);
+				gfs2_free_meta(ip, rgd, bstart, blen);
 			bstart = bn;
+			rgd = gfs2_blk2rgrpd(sdp, bstart, true);
 			blen = 1;
 		}
 
@@ -1342,7 +1344,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
 		gfs2_add_inode_blocks(&ip->i_inode, -1);
 	}
 	if (bstart)
-		gfs2_free_meta(ip, bstart, blen);
+		gfs2_free_meta(ip, rgd, bstart, blen);
 
 	ip->i_diskflags &= ~GFS2_DIF_EA_INDIRECT;
 
@@ -1391,7 +1393,7 @@ static int ea_dealloc_block(struct gfs2_inode *ip)
 	if (error)
 		goto out_gunlock;
 
-	gfs2_free_meta(ip, ip->i_eattr, 1);
+	gfs2_free_meta(ip, rgd, ip->i_eattr, 1);
 
 	ip->i_eattr = 0;
 	gfs2_add_inode_blocks(&ip->i_inode, -1);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking
  2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (9 preceding siblings ...)
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free Andreas Gruenbacher
@ 2018-10-05 19:18 ` Andreas Gruenbacher
  2018-10-08 10:33   ` Steven Whitehouse
  10 siblings, 1 reply; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-05 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

Prepare for treating resource group glocks as exclusive among nodes but
shared among all tasks running on a node: introduce another layer of
node-specific locking that the local tasks can use to coordinate their
accesses.

This patch only introduces the local locking changes necessary so that
future patches can introduce resource group glock sharing.  We replace
the resource group spinlock with a mutex; whether that leads to
noticeable additional contention on the resource group mutex remains to
be seen.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h |  3 +-
 fs/gfs2/lops.c   |  5 ++-
 fs/gfs2/rgrp.c   | 97 +++++++++++++++++++++++++++++++++++-------------
 fs/gfs2/rgrp.h   |  4 ++
 4 files changed, 81 insertions(+), 28 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 932e63924f7e..2fa47b476eef 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -23,6 +23,7 @@
 #include <linux/percpu.h>
 #include <linux/lockref.h>
 #include <linux/rhashtable.h>
+#include <linux/mutex.h>
 
 #define DIO_WAIT	0x00000010
 #define DIO_METADATA	0x00000020
@@ -120,7 +121,7 @@ struct gfs2_rgrpd {
 #define GFS2_RDF_ERROR		0x40000000 /* error in rg */
 #define GFS2_RDF_PREFERRED	0x80000000 /* This rgrp is preferred */
 #define GFS2_RDF_MASK		0xf0000000 /* mask for internal flags */
-	spinlock_t rd_rsspin;           /* protects reservation related vars */
+	struct mutex rd_lock;
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 4c7069b8f3c1..a9e858e01c97 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -76,8 +76,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
 	struct gfs2_bitmap *bi = rgd->rd_bits + index;
 
+	rgrp_lock_local(rgd);
 	if (bi->bi_clone == NULL)
-		return;
+		goto out;
 	if (sdp->sd_args.ar_discard)
 		gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
 	memcpy(bi->bi_clone + bi->bi_offset,
@@ -85,6 +86,8 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	clear_bit(GBF_FULL, &bi->bi_flags);
 	rgd->rd_free_clone = rgd->rd_free;
 	rgd->rd_extfail_pt = rgd->rd_free;
+out:
+	rgrp_unlock_local(rgd);
 }
 
 /**
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 8a6b41f3667c..a89be4782c15 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -702,10 +702,10 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
 
 	rgd = rs->rs_rgd;
 	if (rgd) {
-		spin_lock(&rgd->rd_rsspin);
+		rgrp_lock_local(rgd);
 		__rs_deltree(rs);
 		BUG_ON(rs->rs_free);
-		spin_unlock(&rgd->rd_rsspin);
+		rgrp_unlock_local(rgd);
 	}
 }
 
@@ -737,12 +737,12 @@ static void return_all_reservations(struct gfs2_rgrpd *rgd)
 	struct rb_node *n;
 	struct gfs2_blkreserv *rs;
 
-	spin_lock(&rgd->rd_rsspin);
+	rgrp_lock_local(rgd);
 	while ((n = rb_first(&rgd->rd_rstree))) {
 		rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
 		__rs_deltree(rs);
 	}
-	spin_unlock(&rgd->rd_rsspin);
+	rgrp_unlock_local(rgd);
 }
 
 void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
@@ -948,7 +948,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
 	rgd->rd_data0 = be64_to_cpu(buf.ri_data0);
 	rgd->rd_data = be32_to_cpu(buf.ri_data);
 	rgd->rd_bitbytes = be32_to_cpu(buf.ri_bitbytes);
-	spin_lock_init(&rgd->rd_rsspin);
+	mutex_init(&rgd->rd_lock);
 
 	error = compute_bitstructs(rgd);
 	if (error)
@@ -1469,9 +1469,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			/* Trim each bitmap in the rgrp */
 			for (x = 0; x < rgd->rd_length; x++) {
 				struct gfs2_bitmap *bi = rgd->rd_bits + x;
+				rgrp_lock_local(rgd);
 				ret = gfs2_rgrp_send_discards(sdp,
 						rgd->rd_data0, NULL, bi, minlen,
 						&amt);
+				rgrp_unlock_local(rgd);
 				if (ret) {
 					gfs2_glock_dq_uninit(&gh);
 					goto out;
@@ -1483,9 +1485,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			ret = gfs2_trans_begin(sdp, RES_RG_HDR, 0);
 			if (ret == 0) {
 				bh = rgd->rd_bits[0].bi_bh;
+				rgrp_lock_local(rgd);
 				rgd->rd_flags |= GFS2_RGF_TRIMMED;
 				gfs2_trans_add_meta(rgd->rd_gl, bh);
 				gfs2_rgrp_out(rgd, bh->b_data);
+				rgrp_unlock_local(rgd);
 				gfs2_trans_end(sdp);
 			}
 		}
@@ -1519,7 +1523,6 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	BUG_ON(gfs2_rs_active(rs));
 
-	spin_lock(&rgd->rd_rsspin);
 	newn = &rgd->rd_rstree.rb_node;
 	while (*newn) {
 		struct gfs2_blkreserv *cur =
@@ -1532,7 +1535,6 @@ static void rs_insert(struct gfs2_inode *ip)
 		else if (rc < 0)
 			newn = &((*newn)->rb_left);
 		else {
-			spin_unlock(&rgd->rd_rsspin);
 			WARN_ON(1);
 			return;
 		}
@@ -1540,7 +1542,6 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
 
@@ -1632,7 +1633,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 	struct rb_node *n;
 	int rc;
 
-	spin_lock(&rgd->rd_rsspin);
 	n = rgd->rd_rstree.rb_node;
 	while (n) {
 		rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
@@ -1655,7 +1655,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 		}
 	}
 
-	spin_unlock(&rgd->rd_rsspin);
 	return block;
 }
 
@@ -1857,7 +1856,22 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 	struct gfs2_rbm rbm = { .rgd = rgd, .bii = 0, .offset = 0 };
 
 	while (1) {
+		/*
+		 * We must be careful to avoid deadlock here:
+		 * All transactions expect: sd_log_flush_lock followed by rgrp
+		 * ex (if needed), but try_rgrp_unlink takes sd_log_flush_lock
+		 * outside a transaction and therefore must not have the rgrp
+		 * ex already held. To avoid deadlock, we drop the rgrp ex lock
+		 * before taking the log_flush_lock, then reacquire it to
+		 * protect our call to gfs2_rbm_find.
+		 *
+		 * Also note that rgrp_unlock_local must come AFTER the caller does
+		 * gfs2_rs_deltree because rgrp ex needs to be held before
+		 * making reservations.
+		 */
+		rgrp_unlock_local(rgd);
 		down_write(&sdp->sd_log_flush_lock);
+		rgrp_lock_local(rgd);
 		error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, NULL, NULL,
 				      true);
 		up_write(&sdp->sd_log_flush_lock);
@@ -2055,7 +2069,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrpd *begin = NULL;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	int error = 0, rg_locked, flags = 0;
+	int error = 0, flags = 0;
+	bool rg_locked;
 	u64 last_unlinked = NO_BLOCK;
 	int loops = 0;
 	u32 free_blocks, skip = 0;
@@ -2081,10 +2096,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		return -EBADSLT;
 
 	while (loops < 3) {
-		rg_locked = 1;
-
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
-			rg_locked = 0;
+		rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
+		if (rg_locked) {
+			rgrp_lock_local(rs->rs_rgd);
+		} else {
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
@@ -2101,12 +2116,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
+			rgrp_lock_local(rs->rs_rgd);
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
 			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
+					rgrp_unlock_local(rs->rs_rgd);
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
 				}
@@ -2140,9 +2157,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rs->rs_reserved = ap->target;
 			if (rs->rs_reserved > free_blocks)
 				rs->rs_reserved = free_blocks;
-			spin_lock(&rs->rs_rgd->rd_rsspin);
 			rgd->rd_reserved += rs->rs_reserved;
-			spin_unlock(&rs->rs_rgd->rd_rsspin);
+			rgrp_unlock_local(rs->rs_rgd);
 			return 0;
 		}
 check_rgrp:
@@ -2151,6 +2167,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
+		rgrp_unlock_local(rs->rs_rgd);
+
 		/* Drop reservation, if we couldn't use reserved rgrp */
 		if (gfs2_rs_active(rs))
 			gfs2_rs_deltree(rs);
@@ -2199,10 +2217,10 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 	if (rs->rs_reserved) {
 		struct gfs2_rgrpd *rgd = rs->rs_rgd;
 
-		spin_lock(&rgd->rd_rsspin);
+		rgrp_lock_local(rgd);
 		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
 		rgd->rd_reserved -= rs->rs_reserved;
-		spin_unlock(&rs->rs_rgd->rd_rsspin);
+		rgrp_unlock_local(rgd);
 		rs->rs_reserved = 0;
 	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
@@ -2304,12 +2322,12 @@ void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 			       be32_to_cpu(rgl->rl_free),
 			       be32_to_cpu(rgl->rl_dinodes));
 	}
-	spin_lock(&rgd->rd_rsspin);
+	rgrp_lock_local(rgd);
 	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
 		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
 		dump_rs(seq, trs);
 	}
-	spin_unlock(&rgd->rd_rsspin);
+	rgrp_unlock_local(rgd);
 }
 
 static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
@@ -2339,7 +2357,6 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
 
-	spin_lock(&rgd->rd_rsspin);
 	BUG_ON(rs->rs_reserved < len);
 	rgd->rd_reserved -= len;
 	rs->rs_reserved -= len;
@@ -2355,15 +2372,13 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
-				goto out;
+				return;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
 			atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
 		}
 		__rs_deltree(rs);
 	}
-out:
-	spin_unlock(&rgd->rd_rsspin);
 }
 
 /**
@@ -2416,6 +2431,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	BUG_ON(ip->i_res.rs_reserved < *nblocks);
 
+	rgrp_lock_local(rbm.rgd);
 	gfs2_set_alloc_start(&rbm, ip, dinode);
 	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
 
@@ -2460,6 +2476,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	}
 
 	rbm.rgd->rd_free -= *nblocks;
+	rbm.rgd->rd_free_clone -= *nblocks;
 	if (dinode) {
 		rbm.rgd->rd_dinodes++;
 		*generation = rbm.rgd->rd_igeneration++;
@@ -2469,6 +2486,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_trans_add_meta(rbm.rgd->rd_gl, rbm.rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rbm.rgd, rbm.rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rbm.rgd);
 
 	gfs2_statfs_change(sdp, 0, -(s64)*nblocks, dinode ? 1 : 0);
 	if (dinode)
@@ -2476,13 +2494,13 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_quota_change(ip, *nblocks, ip->i_inode.i_uid, ip->i_inode.i_gid);
 
-	rbm.rgd->rd_free_clone -= *nblocks;
 	trace_gfs2_block_alloc(ip, rbm.rgd, block, *nblocks,
 			       dinode ? GFS2_BLKST_DINODE : GFS2_BLKST_USED);
 	*bn = block;
 	return 0;
 
 rgrp_error:
+	rgrp_unlock_local(rbm.rgd);
 	gfs2_rgrp_error(rbm.rgd);
 	return -EIO;
 }
@@ -2502,12 +2520,14 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
 	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
 	rgd->rd_free += blen;
 	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 
 	/* Directories keep their data in the metadata address space */
 	if (meta || ip->i_depth)
@@ -2543,17 +2563,20 @@ void gfs2_unlink_di(struct inode *inode)
 	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
 	if (!rgd)
 		return;
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, 1);
+	rgrp_unlock_local(rgd);
 }
 
 void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
 	if (!rgd->rd_dinodes)
 		gfs2_consist_rgrpd(rgd);
@@ -2562,6 +2585,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, -1);
 
 	gfs2_statfs_change(sdp, 0, +1, -1);
@@ -2576,6 +2600,10 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
  * @no_addr: The block number to check
  * @type: The block type we are looking for
  *
+ * The inode glock of @no_addr must be held.  The @type to check for is either
+ * GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; checking for type GFS2_BLKST_FREE
+ * or GFS2_BLKST_USED would make no sense.
+ *
  * Returns: 0 if the block type matches the expected type
  *          -ESTALE if it doesn't match
  *          or -ve errno if something went wrong while checking
@@ -2601,6 +2629,12 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
 	if (WARN_ON_ONCE(error))
 		goto fail;
 
+	/*
+	 * No need to take the local resource group lock here; the inode glock
+	 * of @no_addr provides the necessary synchronization in case the block
+	 * is an inode.  (In case the block is not an inode, the block type
+	 * will not match the @type we are looking for.)
+	 */
 	if (gfs2_testbit(&rbm, false) != type)
 		error = -ESTALE;
 
@@ -2723,3 +2757,14 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
 	}
 }
 
+void rgrp_lock_local(struct gfs2_rgrpd *rgd)
+{
+	BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
+	       !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
+	mutex_lock(&rgd->rd_lock);
+}
+
+void rgrp_unlock_local(struct gfs2_rgrpd *rgd)
+{
+	mutex_unlock(&rgd->rd_lock);
+}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index b596c3d17988..33e52dab76ef 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -92,4 +92,8 @@ static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block)
 }
 
 extern void check_and_update_goal(struct gfs2_inode *ip);
+
+extern void rgrp_lock_local(struct gfs2_rgrpd *rgd);
+extern void rgrp_unlock_local(struct gfs2_rgrpd *rgd);
+
 #endif /* __RGRP_DOT_H__ */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
@ 2018-10-08 10:12   ` Steven Whitehouse
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:12 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 05/10/18 20:18, Andreas Gruenbacher wrote:
> When gfs2_rbm_from_block fails, the rbm it returns is undefined, so we
> always want to make sure gfs2_rbm_from_block has succeeded before
> looking at the rbm.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/rgrp.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index fc181c81cca2..c9caddc2627c 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -2227,7 +2227,7 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
>   		return NULL;
>   	}
>   
> -	gfs2_rbm_from_block(&rbm, bstart);
> +	BUG_ON(gfs2_rbm_from_block(&rbm, bstart));
>   	while (blen--) {
>   		bi = rbm_bi(&rbm);
>   		if (bi != bi_prev) {
> @@ -2360,7 +2360,7 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
>   	else
>   		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
>   
> -	gfs2_rbm_from_block(rbm, goal);
> +	BUG_ON(gfs2_rbm_from_block(rbm, goal));
>   }
Could we make this a warn once with a failure path here? It is a bit 
more friendly than BUG_ON,

Steve.

>   
>   /**
> @@ -2569,7 +2569,8 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
>   
>   	rbm.rgd = rgd;
>   	error = gfs2_rbm_from_block(&rbm, no_addr);
> -	WARN_ON_ONCE(error != 0);
> +	if (WARN_ON_ONCE(error))
> +		goto fail;
>   
>   	if (gfs2_testbit(&rbm, false) != type)
>   		error = -ESTALE;



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full Andreas Gruenbacher
@ 2018-10-08 10:23   ` Steven Whitehouse
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 05/10/18 20:18, Andreas Gruenbacher wrote:
> Reservations in gfs can span multiple gfs2_bitmaps (but they won't span
> multiple resource groups).  When removing a reservation, we want to
> clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of
> the first bitmap.
This looks like a bug fix that we should have anyway,

Steve.

> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/rgrp.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index ee6ea7d8cf44..ee981085db33 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -672,7 +672,7 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
>   	RB_CLEAR_NODE(&rs->rs_node);
>   
>   	if (rs->rs_free) {
> -		struct gfs2_bitmap *bi;
> +		struct gfs2_bitmap *start, *last;
>   
>   		/* return reserved blocks to the rgrp */
>   		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
> @@ -682,10 +682,15 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
>   		   contiguous with a span of free blocks that follows. Still,
>   		   it will force the number to be recalculated later. */
>   		rgd->rd_extfail_pt += rs->rs_free;
> +		start = gfs2_block_to_bitmap(rgd, rs->rs_start);
> +		last = gfs2_block_to_bitmap(rgd,
> +				rs->rs_start + rs->rs_free - 1);
>   		rs->rs_free = 0;
> -		bi = gfs2_block_to_bitmap(rgd, rs->rs_start);
> -		if (bi)
> -			clear_bit(GBF_FULL, &bi->bi_flags);
> +		if (!start || !last)
> +			return;
> +		do
> +			clear_bit(GBF_FULL, &start->bi_flags);
> +		while (start++ != last);
>   	}
>   }
>   



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free Andreas Gruenbacher
@ 2018-10-08 10:28   ` Steven Whitehouse
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:28 UTC (permalink / raw)
  To: cluster-devel.redhat.com



On 05/10/18 20:18, Andreas Gruenbacher wrote:
> Function rgblk_free can only deal with one resource group at a time, so
> pass that resource group is as a parameter.  Several of the callers
> already have the resource group at hand, so we only need additional
> lookup code in a few places.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
That looks like a good optimisation,

Steve.

> ---
>   fs/gfs2/bmap.c  |  4 ++--
>   fs/gfs2/dir.c   |  5 ++++-
>   fs/gfs2/rgrp.c  | 42 +++++++++++++++---------------------------
>   fs/gfs2/rgrp.h  |  6 ++++--
>   fs/gfs2/xattr.c | 16 +++++++++-------
>   5 files changed, 34 insertions(+), 39 deletions(-)
>
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index c192906bb5f6..55e8ad1a6e13 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1566,7 +1566,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
>   			continue;
>   		}
>   		if (bstart) {
> -			__gfs2_free_blocks(ip, bstart, (u32)blen, meta);
> +			__gfs2_free_blocks(ip, rgd, bstart, (u32)blen, meta);
>   			(*btotal) += blen;
>   			gfs2_add_inode_blocks(&ip->i_inode, -blen);
>   		}
> @@ -1574,7 +1574,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
>   		blen = 1;
>   	}
>   	if (bstart) {
> -		__gfs2_free_blocks(ip, bstart, (u32)blen, meta);
> +		__gfs2_free_blocks(ip, rgd, bstart, (u32)blen, meta);
>   		(*btotal) += blen;
>   		gfs2_add_inode_blocks(&ip->i_inode, -blen);
>   	}
> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
> index 89c601e5e52f..f9c6c7ee89e1 100644
> --- a/fs/gfs2/dir.c
> +++ b/fs/gfs2/dir.c
> @@ -2039,6 +2039,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>   	bh = leaf_bh;
>   
>   	for (blk = leaf_no; blk; blk = nblk) {
> +		struct gfs2_rgrpd *rgd;
> +
>   		if (blk != leaf_no) {
>   			error = get_leaf(dip, blk, &bh);
>   			if (error)
> @@ -2049,7 +2051,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>   		if (blk != leaf_no)
>   			brelse(bh);
>   
> -		gfs2_free_meta(dip, blk, 1);
> +		rgd = gfs2_blk2rgrpd(sdp, blk, true);
> +		gfs2_free_meta(dip, rgd, blk, 1);
>   		gfs2_add_inode_blocks(&dip->i_inode, -1);
>   	}
>   
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 76a0a8073c11..8a6b41f3667c 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -2245,26 +2245,19 @@ static void gfs2_alloc_extent(const struct gfs2_rbm *rbm, bool dinode,
>   /**
>    * rgblk_free - Change alloc state of given block(s)
>    * @sdp: the filesystem
> + * @rgd: the resource group the blocks are in
>    * @bstart: the start of a run of blocks to free
>    * @blen: the length of the block run (all must lie within ONE RG!)
>    * @new_state: GFS2_BLKST_XXX the after-allocation block state
> - *
> - * Returns:  Resource group containing the block(s)
>    */
>   
> -static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
> -				     u32 blen, unsigned char new_state)
> +static void rgblk_free(struct gfs2_sbd *sdp, struct gfs2_rgrpd *rgd,
> +		       u64 bstart, u32 blen, unsigned char new_state)
>   {
>   	struct gfs2_rbm rbm;
>   	struct gfs2_bitmap *bi, *bi_prev = NULL;
>   
> -	rbm.rgd = gfs2_blk2rgrpd(sdp, bstart, 1);
> -	if (!rbm.rgd) {
> -		if (gfs2_consist(sdp))
> -			fs_err(sdp, "block = %llu\n", (unsigned long long)bstart);
> -		return NULL;
> -	}
> -
> +	rbm.rgd = rgd;
>   	BUG_ON(gfs2_rbm_from_block(&rbm, bstart));
>   	while (blen--) {
>   		bi = rbm_bi(&rbm);
> @@ -2282,8 +2275,6 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
>   		gfs2_setbit(&rbm, false, new_state);
>   		gfs2_rbm_incr(&rbm);
>   	}
> -
> -	return rbm.rgd;
>   }
>   
>   /**
> @@ -2499,20 +2490,19 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   /**
>    * __gfs2_free_blocks - free a contiguous run of block(s)
>    * @ip: the inode these blocks are being freed from
> + * @rgd: the resource group the blocks are in
>    * @bstart: first block of a run of contiguous blocks
>    * @blen: the length of the block run
>    * @meta: 1 if the blocks represent metadata
>    *
>    */
>   
> -void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta)
> +void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
> +			u64 bstart, u32 blen, int meta)
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> -	struct gfs2_rgrpd *rgd;
>   
> -	rgd = rgblk_free(sdp, bstart, blen, GFS2_BLKST_FREE);
> -	if (!rgd)
> -		return;
> +	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
>   	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
>   	rgd->rd_free += blen;
>   	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
> @@ -2527,16 +2517,18 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta)
>   /**
>    * gfs2_free_meta - free a contiguous run of data block(s)
>    * @ip: the inode these blocks are being freed from
> + * @rgd: the resource group the blocks are in
>    * @bstart: first block of a run of contiguous blocks
>    * @blen: the length of the block run
>    *
>    */
>   
> -void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen)
> +void gfs2_free_meta(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
> +		    u64 bstart, u32 blen)
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   
> -	__gfs2_free_blocks(ip, bstart, blen, 1);
> +	__gfs2_free_blocks(ip, rgd, bstart, blen, 1);
>   	gfs2_statfs_change(sdp, 0, +blen, 0);
>   	gfs2_quota_change(ip, -(s64)blen, ip->i_inode.i_uid, ip->i_inode.i_gid);
>   }
> @@ -2548,9 +2540,10 @@ void gfs2_unlink_di(struct inode *inode)
>   	struct gfs2_rgrpd *rgd;
>   	u64 blkno = ip->i_no_addr;
>   
> -	rgd = rgblk_free(sdp, blkno, 1, GFS2_BLKST_UNLINKED);
> +	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
>   	if (!rgd)
>   		return;
> +	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
>   	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
>   	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
>   	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> @@ -2560,13 +2553,8 @@ void gfs2_unlink_di(struct inode *inode)
>   void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
>   {
>   	struct gfs2_sbd *sdp = rgd->rd_sbd;
> -	struct gfs2_rgrpd *tmp_rgd;
> -
> -	tmp_rgd = rgblk_free(sdp, ip->i_no_addr, 1, GFS2_BLKST_FREE);
> -	if (!tmp_rgd)
> -		return;
> -	gfs2_assert_withdraw(sdp, rgd == tmp_rgd);
>   
> +	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
>   	if (!rgd->rd_dinodes)
>   		gfs2_consist_rgrpd(rgd);
>   	rgd->rd_dinodes--;
> diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
> index 09519ae10fb6..b596c3d17988 100644
> --- a/fs/gfs2/rgrp.h
> +++ b/fs/gfs2/rgrp.h
> @@ -51,8 +51,10 @@ extern int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *n,
>   extern int gfs2_rsqa_alloc(struct gfs2_inode *ip);
>   extern void gfs2_rs_deltree(struct gfs2_blkreserv *rs);
>   extern void gfs2_rsqa_delete(struct gfs2_inode *ip, atomic_t *wcount);
> -extern void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int meta);
> -extern void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen);
> +extern void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
> +			       u64 bstart, u32 blen, int meta);
> +extern void gfs2_free_meta(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
> +			   u64 bstart, u32 blen);
>   extern void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip);
>   extern void gfs2_unlink_di(struct inode *inode);
>   extern int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr,
> diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
> index e11f77f080a0..996c915a9c97 100644
> --- a/fs/gfs2/xattr.c
> +++ b/fs/gfs2/xattr.c
> @@ -283,7 +283,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
>   			blen++;
>   		else {
>   			if (bstart)
> -				gfs2_free_meta(ip, bstart, blen);
> +				gfs2_free_meta(ip, rgd, bstart, blen);
>   			bstart = bn;
>   			blen = 1;
>   		}
> @@ -292,7 +292,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
>   		gfs2_add_inode_blocks(&ip->i_inode, -1);
>   	}
>   	if (bstart)
> -		gfs2_free_meta(ip, bstart, blen);
> +		gfs2_free_meta(ip, rgd, bstart, blen);
>   
>   	if (prev && !leave) {
>   		u32 len;
> @@ -1250,6 +1250,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   	struct gfs2_rgrp_list rlist;
> +	struct gfs2_rgrpd *rgd;
>   	struct buffer_head *indbh, *dibh;
>   	__be64 *eablk, *end;
>   	unsigned int rg_blocks = 0;
> @@ -1302,8 +1303,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   	gfs2_rlist_alloc(&rlist);
>   
>   	for (x = 0; x < rlist.rl_rgrps; x++) {
> -		struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
> -
> +		rgd = gfs2_glock2rgrp(rlist.rl_ghs[x].gh_gl);
>   		rg_blocks += rgd->rd_length;
>   	}
>   
> @@ -1320,6 +1320,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   
>   	eablk = (__be64 *)(indbh->b_data + sizeof(struct gfs2_meta_header));
>   	bstart = 0;
> +	rgd = NULL;
>   	blen = 0;
>   
>   	for (; eablk < end; eablk++) {
> @@ -1333,8 +1334,9 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   			blen++;
>   		else {
>   			if (bstart)
> -				gfs2_free_meta(ip, bstart, blen);
> +				gfs2_free_meta(ip, rgd, bstart, blen);
>   			bstart = bn;
> +			rgd = gfs2_blk2rgrpd(sdp, bstart, true);
>   			blen = 1;
>   		}
>   
> @@ -1342,7 +1344,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip)
>   		gfs2_add_inode_blocks(&ip->i_inode, -1);
>   	}
>   	if (bstart)
> -		gfs2_free_meta(ip, bstart, blen);
> +		gfs2_free_meta(ip, rgd, bstart, blen);
>   
>   	ip->i_diskflags &= ~GFS2_DIF_EA_INDIRECT;
>   
> @@ -1391,7 +1393,7 @@ static int ea_dealloc_block(struct gfs2_inode *ip)
>   	if (error)
>   		goto out_gunlock;
>   
> -	gfs2_free_meta(ip, ip->i_eattr, 1);
> +	gfs2_free_meta(ip, rgd, ip->i_eattr, 1);
>   
>   	ip->i_eattr = 0;
>   	gfs2_add_inode_blocks(&ip->i_inode, -1);



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking Andreas Gruenbacher
@ 2018-10-08 10:33   ` Steven Whitehouse
  2018-10-08 12:56     ` Andreas Gruenbacher
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 05/10/18 20:18, Andreas Gruenbacher wrote:
> From: Bob Peterson <rpeterso@redhat.com>
>
> Prepare for treating resource group glocks as exclusive among nodes but
> shared among all tasks running on a node: introduce another layer of
> node-specific locking that the local tasks can use to coordinate their
> accesses.
>
> This patch only introduces the local locking changes necessary so that
> future patches can introduce resource group glock sharing.  We replace
> the resource group spinlock with a mutex; whether that leads to
> noticeable additional contention on the resource group mutex remains to
> be seen.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/incore.h |  3 +-
>   fs/gfs2/lops.c   |  5 ++-
>   fs/gfs2/rgrp.c   | 97 +++++++++++++++++++++++++++++++++++-------------
>   fs/gfs2/rgrp.h   |  4 ++
>   4 files changed, 81 insertions(+), 28 deletions(-)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 932e63924f7e..2fa47b476eef 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -23,6 +23,7 @@
>   #include <linux/percpu.h>
>   #include <linux/lockref.h>
>   #include <linux/rhashtable.h>
> +#include <linux/mutex.h>
>   
>   #define DIO_WAIT	0x00000010
>   #define DIO_METADATA	0x00000020
> @@ -120,7 +121,7 @@ struct gfs2_rgrpd {
>   #define GFS2_RDF_ERROR		0x40000000 /* error in rg */
>   #define GFS2_RDF_PREFERRED	0x80000000 /* This rgrp is preferred */
>   #define GFS2_RDF_MASK		0xf0000000 /* mask for internal flags */
> -	spinlock_t rd_rsspin;           /* protects reservation related vars */
> +	struct mutex rd_lock;
I can see why we might need to have additional local rgrp locking, but 
why do we need to make the sd_rsspin into a mutex? I'm wondering if 
these should not be separate locks still?

Steve.

>   	struct rb_root rd_rstree;       /* multi-block reservation tree */
>   };
>   
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 4c7069b8f3c1..a9e858e01c97 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -76,8 +76,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
>   	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
>   	struct gfs2_bitmap *bi = rgd->rd_bits + index;
>   
> +	rgrp_lock_local(rgd);
>   	if (bi->bi_clone == NULL)
> -		return;
> +		goto out;
>   	if (sdp->sd_args.ar_discard)
>   		gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
>   	memcpy(bi->bi_clone + bi->bi_offset,
> @@ -85,6 +86,8 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
>   	clear_bit(GBF_FULL, &bi->bi_flags);
>   	rgd->rd_free_clone = rgd->rd_free;
>   	rgd->rd_extfail_pt = rgd->rd_free;
> +out:
> +	rgrp_unlock_local(rgd);
>   }
>   
>   /**
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 8a6b41f3667c..a89be4782c15 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -702,10 +702,10 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
>   
>   	rgd = rs->rs_rgd;
>   	if (rgd) {
> -		spin_lock(&rgd->rd_rsspin);
> +		rgrp_lock_local(rgd);
>   		__rs_deltree(rs);
>   		BUG_ON(rs->rs_free);
> -		spin_unlock(&rgd->rd_rsspin);
> +		rgrp_unlock_local(rgd);
>   	}
>   }
>   
> @@ -737,12 +737,12 @@ static void return_all_reservations(struct gfs2_rgrpd *rgd)
>   	struct rb_node *n;
>   	struct gfs2_blkreserv *rs;
>   
> -	spin_lock(&rgd->rd_rsspin);
> +	rgrp_lock_local(rgd);
>   	while ((n = rb_first(&rgd->rd_rstree))) {
>   		rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
>   		__rs_deltree(rs);
>   	}
> -	spin_unlock(&rgd->rd_rsspin);
> +	rgrp_unlock_local(rgd);
>   }
>   
>   void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
> @@ -948,7 +948,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
>   	rgd->rd_data0 = be64_to_cpu(buf.ri_data0);
>   	rgd->rd_data = be32_to_cpu(buf.ri_data);
>   	rgd->rd_bitbytes = be32_to_cpu(buf.ri_bitbytes);
> -	spin_lock_init(&rgd->rd_rsspin);
> +	mutex_init(&rgd->rd_lock);
>   
>   	error = compute_bitstructs(rgd);
>   	if (error)
> @@ -1469,9 +1469,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
>   			/* Trim each bitmap in the rgrp */
>   			for (x = 0; x < rgd->rd_length; x++) {
>   				struct gfs2_bitmap *bi = rgd->rd_bits + x;
> +				rgrp_lock_local(rgd);
>   				ret = gfs2_rgrp_send_discards(sdp,
>   						rgd->rd_data0, NULL, bi, minlen,
>   						&amt);
> +				rgrp_unlock_local(rgd);
>   				if (ret) {
>   					gfs2_glock_dq_uninit(&gh);
>   					goto out;
> @@ -1483,9 +1485,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
>   			ret = gfs2_trans_begin(sdp, RES_RG_HDR, 0);
>   			if (ret == 0) {
>   				bh = rgd->rd_bits[0].bi_bh;
> +				rgrp_lock_local(rgd);
>   				rgd->rd_flags |= GFS2_RGF_TRIMMED;
>   				gfs2_trans_add_meta(rgd->rd_gl, bh);
>   				gfs2_rgrp_out(rgd, bh->b_data);
> +				rgrp_unlock_local(rgd);
>   				gfs2_trans_end(sdp);
>   			}
>   		}
> @@ -1519,7 +1523,6 @@ static void rs_insert(struct gfs2_inode *ip)
>   
>   	BUG_ON(gfs2_rs_active(rs));
>   
> -	spin_lock(&rgd->rd_rsspin);
>   	newn = &rgd->rd_rstree.rb_node;
>   	while (*newn) {
>   		struct gfs2_blkreserv *cur =
> @@ -1532,7 +1535,6 @@ static void rs_insert(struct gfs2_inode *ip)
>   		else if (rc < 0)
>   			newn = &((*newn)->rb_left);
>   		else {
> -			spin_unlock(&rgd->rd_rsspin);
>   			WARN_ON(1);
>   			return;
>   		}
> @@ -1540,7 +1542,6 @@ static void rs_insert(struct gfs2_inode *ip)
>   
>   	rb_link_node(&rs->rs_node, parent, newn);
>   	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
> -	spin_unlock(&rgd->rd_rsspin);
>   	trace_gfs2_rs(rs, TRACE_RS_INSERT);
>   }
>   
> @@ -1632,7 +1633,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
>   	struct rb_node *n;
>   	int rc;
>   
> -	spin_lock(&rgd->rd_rsspin);
>   	n = rgd->rd_rstree.rb_node;
>   	while (n) {
>   		rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
> @@ -1655,7 +1655,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
>   		}
>   	}
>   
> -	spin_unlock(&rgd->rd_rsspin);
>   	return block;
>   }
>   
> @@ -1857,7 +1856,22 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
>   	struct gfs2_rbm rbm = { .rgd = rgd, .bii = 0, .offset = 0 };
>   
>   	while (1) {
> +		/*
> +		 * We must be careful to avoid deadlock here:
> +		 * All transactions expect: sd_log_flush_lock followed by rgrp
> +		 * ex (if needed), but try_rgrp_unlink takes sd_log_flush_lock
> +		 * outside a transaction and therefore must not have the rgrp
> +		 * ex already held. To avoid deadlock, we drop the rgrp ex lock
> +		 * before taking the log_flush_lock, then reacquire it to
> +		 * protect our call to gfs2_rbm_find.
> +		 *
> +		 * Also note that rgrp_unlock_local must come AFTER the caller does
> +		 * gfs2_rs_deltree because rgrp ex needs to be held before
> +		 * making reservations.
> +		 */
> +		rgrp_unlock_local(rgd);
>   		down_write(&sdp->sd_log_flush_lock);
> +		rgrp_lock_local(rgd);
>   		error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, NULL, NULL,
>   				      true);
>   		up_write(&sdp->sd_log_flush_lock);
> @@ -2055,7 +2069,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   	struct gfs2_rgrpd *begin = NULL;
>   	struct gfs2_blkreserv *rs = &ip->i_res;
> -	int error = 0, rg_locked, flags = 0;
> +	int error = 0, flags = 0;
> +	bool rg_locked;
>   	u64 last_unlinked = NO_BLOCK;
>   	int loops = 0;
>   	u32 free_blocks, skip = 0;
> @@ -2081,10 +2096,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   		return -EBADSLT;
>   
>   	while (loops < 3) {
> -		rg_locked = 1;
> -
> -		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
> -			rg_locked = 0;
> +		rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
> +		if (rg_locked) {
> +			rgrp_lock_local(rs->rs_rgd);
> +		} else {
>   			if (skip && skip--)
>   				goto next_rgrp;
>   			if (!gfs2_rs_active(rs)) {
> @@ -2101,12 +2116,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   						   &ip->i_rgd_gh);
>   			if (unlikely(error))
>   				return error;
> +			rgrp_lock_local(rs->rs_rgd);
>   			if (!gfs2_rs_active(rs) && (loops < 2) &&
>   			    gfs2_rgrp_congested(rs->rs_rgd, loops))
>   				goto skip_rgrp;
>   			if (sdp->sd_args.ar_rgrplvb) {
>   				error = update_rgrp_lvb(rs->rs_rgd);
>   				if (unlikely(error)) {
> +					rgrp_unlock_local(rs->rs_rgd);
>   					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   					return error;
>   				}
> @@ -2140,9 +2157,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			rs->rs_reserved = ap->target;
>   			if (rs->rs_reserved > free_blocks)
>   				rs->rs_reserved = free_blocks;
> -			spin_lock(&rs->rs_rgd->rd_rsspin);
>   			rgd->rd_reserved += rs->rs_reserved;
> -			spin_unlock(&rs->rs_rgd->rd_rsspin);
> +			rgrp_unlock_local(rs->rs_rgd);
>   			return 0;
>   		}
>   check_rgrp:
> @@ -2151,6 +2167,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
>   					ip->i_no_addr);
>   skip_rgrp:
> +		rgrp_unlock_local(rs->rs_rgd);
> +
>   		/* Drop reservation, if we couldn't use reserved rgrp */
>   		if (gfs2_rs_active(rs))
>   			gfs2_rs_deltree(rs);
> @@ -2199,10 +2217,10 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
>   	if (rs->rs_reserved) {
>   		struct gfs2_rgrpd *rgd = rs->rs_rgd;
>   
> -		spin_lock(&rgd->rd_rsspin);
> +		rgrp_lock_local(rgd);
>   		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
>   		rgd->rd_reserved -= rs->rs_reserved;
> -		spin_unlock(&rs->rs_rgd->rd_rsspin);
> +		rgrp_unlock_local(rgd);
>   		rs->rs_reserved = 0;
>   	}
>   	if (gfs2_holder_initialized(&ip->i_rgd_gh))
> @@ -2304,12 +2322,12 @@ void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
>   			       be32_to_cpu(rgl->rl_free),
>   			       be32_to_cpu(rgl->rl_dinodes));
>   	}
> -	spin_lock(&rgd->rd_rsspin);
> +	rgrp_lock_local(rgd);
>   	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
>   		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
>   		dump_rs(seq, trs);
>   	}
> -	spin_unlock(&rgd->rd_rsspin);
> +	rgrp_unlock_local(rgd);
>   }
>   
>   static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
> @@ -2339,7 +2357,6 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
>   	struct gfs2_blkreserv *rs = &ip->i_res;
>   	struct gfs2_rgrpd *rgd = rbm->rgd;
>   
> -	spin_lock(&rgd->rd_rsspin);
>   	BUG_ON(rs->rs_reserved < len);
>   	rgd->rd_reserved -= len;
>   	rs->rs_reserved -= len;
> @@ -2355,15 +2372,13 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
>   			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
>   			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
>   			    rs->rs_free)
> -				goto out;
> +				return;
>   			/* We used up our block reservation, so we should
>   			   reserve more blocks next time. */
>   			atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
>   		}
>   		__rs_deltree(rs);
>   	}
> -out:
> -	spin_unlock(&rgd->rd_rsspin);
>   }
>   
>   /**
> @@ -2416,6 +2431,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   
>   	BUG_ON(ip->i_res.rs_reserved < *nblocks);
>   
> +	rgrp_lock_local(rbm.rgd);
>   	gfs2_set_alloc_start(&rbm, ip, dinode);
>   	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
>   
> @@ -2460,6 +2476,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   	}
>   
>   	rbm.rgd->rd_free -= *nblocks;
> +	rbm.rgd->rd_free_clone -= *nblocks;
>   	if (dinode) {
>   		rbm.rgd->rd_dinodes++;
>   		*generation = rbm.rgd->rd_igeneration++;
> @@ -2469,6 +2486,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   
>   	gfs2_trans_add_meta(rbm.rgd->rd_gl, rbm.rgd->rd_bits[0].bi_bh);
>   	gfs2_rgrp_out(rbm.rgd, rbm.rgd->rd_bits[0].bi_bh->b_data);
> +	rgrp_unlock_local(rbm.rgd);
>   
>   	gfs2_statfs_change(sdp, 0, -(s64)*nblocks, dinode ? 1 : 0);
>   	if (dinode)
> @@ -2476,13 +2494,13 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   
>   	gfs2_quota_change(ip, *nblocks, ip->i_inode.i_uid, ip->i_inode.i_gid);
>   
> -	rbm.rgd->rd_free_clone -= *nblocks;
>   	trace_gfs2_block_alloc(ip, rbm.rgd, block, *nblocks,
>   			       dinode ? GFS2_BLKST_DINODE : GFS2_BLKST_USED);
>   	*bn = block;
>   	return 0;
>   
>   rgrp_error:
> +	rgrp_unlock_local(rbm.rgd);
>   	gfs2_rgrp_error(rbm.rgd);
>   	return -EIO;
>   }
> @@ -2502,12 +2520,14 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   
> +	rgrp_lock_local(rgd);
>   	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
>   	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
>   	rgd->rd_free += blen;
>   	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
>   	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
>   	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> +	rgrp_unlock_local(rgd);
>   
>   	/* Directories keep their data in the metadata address space */
>   	if (meta || ip->i_depth)
> @@ -2543,17 +2563,20 @@ void gfs2_unlink_di(struct inode *inode)
>   	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
>   	if (!rgd)
>   		return;
> +	rgrp_lock_local(rgd);
>   	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
>   	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
>   	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
>   	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
>   	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, 1);
> +	rgrp_unlock_local(rgd);
>   }
>   
>   void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
>   {
>   	struct gfs2_sbd *sdp = rgd->rd_sbd;
>   
> +	rgrp_lock_local(rgd);
>   	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
>   	if (!rgd->rd_dinodes)
>   		gfs2_consist_rgrpd(rgd);
> @@ -2562,6 +2585,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
>   
>   	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
>   	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> +	rgrp_unlock_local(rgd);
>   	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, -1);
>   
>   	gfs2_statfs_change(sdp, 0, +1, -1);
> @@ -2576,6 +2600,10 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
>    * @no_addr: The block number to check
>    * @type: The block type we are looking for
>    *
> + * The inode glock of @no_addr must be held.  The @type to check for is either
> + * GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; checking for type GFS2_BLKST_FREE
> + * or GFS2_BLKST_USED would make no sense.
> + *
>    * Returns: 0 if the block type matches the expected type
>    *          -ESTALE if it doesn't match
>    *          or -ve errno if something went wrong while checking
> @@ -2601,6 +2629,12 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
>   	if (WARN_ON_ONCE(error))
>   		goto fail;
>   
> +	/*
> +	 * No need to take the local resource group lock here; the inode glock
> +	 * of @no_addr provides the necessary synchronization in case the block
> +	 * is an inode.  (In case the block is not an inode, the block type
> +	 * will not match the @type we are looking for.)
> +	 */
>   	if (gfs2_testbit(&rbm, false) != type)
>   		error = -ESTALE;
>   
> @@ -2723,3 +2757,14 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
>   	}
>   }
>   
> +void rgrp_lock_local(struct gfs2_rgrpd *rgd)
> +{
> +	BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
> +	       !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
> +	mutex_lock(&rgd->rd_lock);
> +}
> +
> +void rgrp_unlock_local(struct gfs2_rgrpd *rgd)
> +{
> +	mutex_unlock(&rgd->rd_lock);
> +}
> diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
> index b596c3d17988..33e52dab76ef 100644
> --- a/fs/gfs2/rgrp.h
> +++ b/fs/gfs2/rgrp.h
> @@ -92,4 +92,8 @@ static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block)
>   }
>   
>   extern void check_and_update_goal(struct gfs2_inode *ip);
> +
> +extern void rgrp_lock_local(struct gfs2_rgrpd *rgd);
> +extern void rgrp_unlock_local(struct gfs2_rgrpd *rgd);
> +
>   #endif /* __RGRP_DOT_H__ */



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode Andreas Gruenbacher
@ 2018-10-08 10:34   ` Steven Whitehouse
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:34 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 05/10/18 20:18, Andreas Gruenbacher wrote:
> Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv
> into the inode: they are more closely related to the inode than to a
> particular reservation.
Yes, that makes sense I think - these fields have moved around a bit 
during the discussions on getting this code right. The only real issue 
here is that the gh is quite large, which was why it was separate in the 
first place, but it probably makes more sense to do it this way,

Steve.

> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/file.c   |  4 ++--
>   fs/gfs2/incore.h |  6 ++----
>   fs/gfs2/main.c   |  2 ++
>   fs/gfs2/rgrp.c   | 16 +++++++---------
>   4 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 08369c6cd127..e8864ff2ed03 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -347,8 +347,8 @@ static void gfs2_size_hint(struct file *filep, loff_t offset, size_t size)
>   	size_t blks = (size + sdp->sd_sb.sb_bsize - 1) >> sdp->sd_sb.sb_bsize_shift;
>   	int hint = min_t(size_t, INT_MAX, blks);
>   
> -	if (hint > atomic_read(&ip->i_res.rs_sizehint))
> -		atomic_set(&ip->i_res.rs_sizehint, hint);
> +	if (hint > atomic_read(&ip->i_sizehint))
> +		atomic_set(&ip->i_sizehint, hint);
>   }
>   
>   /**
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index b96d39c28e17..9d7d9bd8c3a9 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -309,10 +309,6 @@ struct gfs2_qadata { /* quota allocation data */
>   */
>   
>   struct gfs2_blkreserv {
> -	/* components used during write (step 1): */
> -	atomic_t rs_sizehint;         /* hint of the write size */
> -
> -	struct gfs2_holder rs_rgd_gh; /* Filled in by get_local_rgrp */
>   	struct rb_node rs_node;       /* link to other block reservations */
>   	struct gfs2_rbm rs_rbm;       /* Start of reservation */
>   	u32 rs_free;                  /* how many blocks are still free */
> @@ -417,8 +413,10 @@ struct gfs2_inode {
>   	struct gfs2_holder i_iopen_gh;
>   	struct gfs2_holder i_gh; /* for prepare/commit_write only */
>   	struct gfs2_qadata *i_qadata; /* quota allocation data */
> +	struct gfs2_holder i_rgd_gh;
>   	struct gfs2_blkreserv i_res; /* rgrp multi-block reservation */
>   	u64 i_goal;	/* goal block for allocations */
> +	atomic_t i_sizehint;  /* hint of the write size */
>   	struct rw_semaphore i_rw_mutex;
>   	struct list_head i_ordered;
>   	struct list_head i_trunc_list;
> diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
> index 2d55e2c3333c..c7603063f861 100644
> --- a/fs/gfs2/main.c
> +++ b/fs/gfs2/main.c
> @@ -39,9 +39,11 @@ static void gfs2_init_inode_once(void *foo)
>   	struct gfs2_inode *ip = foo;
>   
>   	inode_init_once(&ip->i_inode);
> +	atomic_set(&ip->i_sizehint, 0);
>   	init_rwsem(&ip->i_rw_mutex);
>   	INIT_LIST_HEAD(&ip->i_trunc_list);
>   	ip->i_qadata = NULL;
> +	gfs2_holder_mark_uninitialized(&ip->i_rgd_gh);
>   	memset(&ip->i_res, 0, sizeof(ip->i_res));
>   	RB_CLEAR_NODE(&ip->i_res.rs_node);
>   	ip->i_hash_cache = NULL;
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index c9caddc2627c..34122c546576 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1564,7 +1564,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
>   	if (S_ISDIR(inode->i_mode))
>   		extlen = 1;
>   	else {
> -		extlen = max_t(u32, atomic_read(&rs->rs_sizehint), ap->target);
> +		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
>   		extlen = clamp(extlen, RGRP_RSRV_MINBLKS, free_blocks);
>   	}
>   	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
> @@ -2076,7 +2076,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			}
>   			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
>   						   LM_ST_EXCLUSIVE, flags,
> -						   &rs->rs_rgd_gh);
> +						   &ip->i_rgd_gh);
>   			if (unlikely(error))
>   				return error;
>   			if (!gfs2_rs_active(rs) && (loops < 2) &&
> @@ -2085,7 +2085,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			if (sdp->sd_args.ar_rgrplvb) {
>   				error = update_rgrp_lvb(rs->rs_rbm.rgd);
>   				if (unlikely(error)) {
> -					gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
> +					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   					return error;
>   				}
>   			}
> @@ -2128,7 +2128,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   
>   		/* Unlock rgrp if required */
>   		if (!rg_locked)
> -			gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
> +			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   next_rgrp:
>   		/* Find the next rgrp, and continue looking */
>   		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
> @@ -2165,10 +2165,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   
>   void gfs2_inplace_release(struct gfs2_inode *ip)
>   {
> -	struct gfs2_blkreserv *rs = &ip->i_res;
> -
> -	if (gfs2_holder_initialized(&rs->rs_rgd_gh))
> -		gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
> +	if (gfs2_holder_initialized(&ip->i_rgd_gh))
> +		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   }
>   
>   /**
> @@ -2326,7 +2324,7 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
>   				goto out;
>   			/* We used up our block reservation, so we should
>   			   reserve more blocks next time. */
> -			atomic_add(RGRP_RSRV_ADDBLKS, &rs->rs_sizehint);
> +			atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
>   		}
>   		__rs_deltree(rs);
>   	}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations
  2018-10-05 19:18 ` [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
@ 2018-10-08 10:39   ` Steven Whitehouse
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Whitehouse @ 2018-10-08 10:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 05/10/18 20:18, Andreas Gruenbacher wrote:
> GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
> bit position within a resource group.  This representation is used in
> the bitmap manipulation code to prevent excessive conversions between
> block numbers and bit positions, but also in struct gfs2_blkreserv which
> is part of struct gfs2_inode, to mark the start of a reservation.  In
> the inode, the bit position representation makes less sense: first, the
> start position is used as a block number about as often as a bit
> position; second, the bit position representation makes the code
> unnecessarily difficult to read.
>
> Therefore, change struct gfs2_blkreserv to represent the start of a
> reservation as a block number instead of a bit position.  (This requires
> keeping track of the resource group in gfs2_blkreserv separately.) With
> that change, various things can be slightly simplified, and struct
> gfs2_rbm can be moved to rgrp.c.
Bear in mind that conversion from block number to rbm format is a much 
more expensive operation than conversion from rbm format to a block 
number. So the code is written to reduce the number of conversions from 
block to rbm as much as possible. I'm not sure that this change is 
actually getting us anything useful here,

Steve.

>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/bmap.c       |   2 +-
>   fs/gfs2/incore.h     |  30 +--------
>   fs/gfs2/rgrp.c       | 156 ++++++++++++++++++++++++++-----------------
>   fs/gfs2/trace_gfs2.h |  10 +--
>   fs/gfs2/trans.h      |   2 +-
>   5 files changed, 103 insertions(+), 97 deletions(-)
>
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 03128ed1f34e..c192906bb5f6 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1503,7 +1503,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
>   
>   			/* Must be done with the rgrp glock held: */
>   			if (gfs2_rs_active(&ip->i_res) &&
> -			    rgd == ip->i_res.rs_rbm.rgd)
> +			    rgd == ip->i_res.rs_rgd)
>   				gfs2_rs_deltree(&ip->i_res);
>   		}
>   
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index a1771d8a93be..0ed28fbc73b4 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -124,31 +124,6 @@ struct gfs2_rgrpd {
>   	struct rb_root rd_rstree;       /* multi-block reservation tree */
>   };
>   
> -struct gfs2_rbm {
> -	struct gfs2_rgrpd *rgd;
> -	u32 offset;		/* The offset is bitmap relative */
> -	int bii;		/* Bitmap index */
> -};
> -
> -static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
> -{
> -	return rbm->rgd->rd_bits + rbm->bii;
> -}
> -
> -static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
> -{
> -	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
> -	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
> -		rbm->offset;
> -}
> -
> -static inline bool gfs2_rbm_eq(const struct gfs2_rbm *rbm1,
> -			       const struct gfs2_rbm *rbm2)
> -{
> -	return (rbm1->rgd == rbm2->rgd) && (rbm1->bii == rbm2->bii) &&
> -	       (rbm1->offset == rbm2->offset);
> -}
> -
>   enum gfs2_state_bits {
>   	BH_Pinned = BH_PrivateStart,
>   	BH_Escaped = BH_PrivateStart + 1,
> @@ -309,8 +284,9 @@ struct gfs2_qadata { /* quota allocation data */
>   */
>   
>   struct gfs2_blkreserv {
> -	struct rb_node rs_node;       /* link to other block reservations */
> -	struct gfs2_rbm rs_rbm;       /* Start of reservation */
> +	struct rb_node rs_node;       /* node within rd_rstree */
> +	struct gfs2_rgrpd *rs_rgd;
> +	u64 rs_start;		      /* start of reservation */
>   	u32 rs_free;                  /* how many blocks are still free */
>   };
>   
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 070ad493a4ec..ee6ea7d8cf44 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -49,6 +49,24 @@
>   #define LBITSKIP00 (0x0000000000000000UL)
>   #endif
>   
> +struct gfs2_rbm {
> +	struct gfs2_rgrpd *rgd;
> +	u32 offset;		/* The offset is bitmap relative */
> +	int bii;		/* Bitmap index */
> +};
> +
> +static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
> +{
> +	return rbm->rgd->rd_bits + rbm->bii;
> +}
> +
> +static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
> +{
> +	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
> +	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
> +		rbm->offset;
> +}
> +
>   /*
>    * These routines are used by the resource group routines (rgrp.c)
>    * to keep track of block allocation.  Each block is represented by two
> @@ -184,7 +202,7 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
>   
>   /**
>    * rs_cmp - multi-block reservation range compare
> - * @blk: absolute file system block number of the new reservation
> + * @start: start of the new reservation
>    * @len: number of blocks in the new reservation
>    * @rs: existing reservation to compare against
>    *
> @@ -192,13 +210,11 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
>    *         -1 if the block range is before the start of the reservation
>    *          0 if the block range overlaps with the reservation
>    */
> -static inline int rs_cmp(u64 blk, u32 len, struct gfs2_blkreserv *rs)
> +static inline int rs_cmp(u64 start, u32 len, struct gfs2_blkreserv *rs)
>   {
> -	u64 startblk = gfs2_rbm_to_block(&rs->rs_rbm);
> -
> -	if (blk >= startblk + rs->rs_free)
> +	if (start >= rs->rs_start + rs->rs_free)
>   		return 1;
> -	if (blk + len - 1 < startblk)
> +	if (rs->rs_start >= start + len)
>   		return -1;
>   	return 0;
>   }
> @@ -269,12 +285,11 @@ static u32 gfs2_bitfit(const u8 *buf, const unsigned int len,
>   
>   static int gfs2_rbm_from_block(struct gfs2_rbm *rbm, u64 block)
>   {
> -	u64 rblock = block - rbm->rgd->rd_data0;
> +	u64 rblock;
>   
> -	if (WARN_ON_ONCE(rblock > UINT_MAX))
> -		return -EINVAL;
> -	if (block >= rbm->rgd->rd_data0 + rbm->rgd->rd_data)
> +	if (!rgrp_contains_block(rbm->rgd, block))
>   		return -E2BIG;
> +	rblock = block - rbm->rgd->rd_data0;
>   
>   	rbm->bii = 0;
>   	rbm->offset = (u32)(rblock);
> @@ -316,13 +331,28 @@ static bool gfs2_rbm_incr(struct gfs2_rbm *rbm)
>   	return false;
>   }
>   
> +static struct gfs2_bitmap *gfs2_block_to_bitmap(struct gfs2_rgrpd *rgd,
> +						u64 block)
> +{
> +	unsigned int delta = (sizeof(struct gfs2_rgrp) -
> +			      sizeof(struct gfs2_meta_header)) * GFS2_NBBY;
> +	unsigned int rblock, bii;
> +
> +	if (!rgrp_contains_block(rgd, block))
> +		return NULL;
> +	rblock = block - rgd->rd_data0;
> +	bii = (rblock + delta) / rgd->rd_sbd->sd_blocks_per_bitmap;
> +	return rgd->rd_bits + bii;
> +}
> +
>   /**
>    * gfs2_unaligned_extlen - Look for free blocks which are not byte aligned
>    * @rbm: Position to search (value/result)
>    * @n_unaligned: Number of unaligned blocks to check
>    * @len: Decremented for each block found (terminate on zero)
>    *
> - * Returns: true if a non-free block is encountered
> + * Returns: true if a non-free block is encountered or the end of the resource
> + *	    group is reached.
>    */
>   
>   static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *len)
> @@ -618,10 +648,10 @@ static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs)
>   {
>   	struct gfs2_inode *ip = container_of(rs, struct gfs2_inode, i_res);
>   
> -	gfs2_print_dbg(seq, "  B: n:%llu s:%llu b:%u f:%u\n",
> +	gfs2_print_dbg(seq, "  B: n:%llu s:%llu f:%u\n",
>   		       (unsigned long long)ip->i_no_addr,
> -		       (unsigned long long)gfs2_rbm_to_block(&rs->rs_rbm),
> -		       rs->rs_rbm.offset, rs->rs_free);
> +		       (unsigned long long)rs->rs_start,
> +		       rs->rs_free);
>   }
>   
>   /**
> @@ -636,24 +666,26 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
>   	if (!gfs2_rs_active(rs))
>   		return;
>   
> -	rgd = rs->rs_rbm.rgd;
> +	rgd = rs->rs_rgd;
>   	trace_gfs2_rs(rs, TRACE_RS_TREEDEL);
>   	rb_erase(&rs->rs_node, &rgd->rd_rstree);
>   	RB_CLEAR_NODE(&rs->rs_node);
>   
>   	if (rs->rs_free) {
> -		struct gfs2_bitmap *bi = rbm_bi(&rs->rs_rbm);
> +		struct gfs2_bitmap *bi;
>   
>   		/* return reserved blocks to the rgrp */
> -		BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free);
> -		rs->rs_rbm.rgd->rd_reserved -= rs->rs_free;
> +		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
> +		rs->rs_rgd->rd_reserved -= rs->rs_free;
>   		/* The rgrp extent failure point is likely not to increase;
>   		   it will only do so if the freed blocks are somehow
>   		   contiguous with a span of free blocks that follows. Still,
>   		   it will force the number to be recalculated later. */
>   		rgd->rd_extfail_pt += rs->rs_free;
>   		rs->rs_free = 0;
> -		clear_bit(GBF_FULL, &bi->bi_flags);
> +		bi = gfs2_block_to_bitmap(rgd, rs->rs_start);
> +		if (bi)
> +			clear_bit(GBF_FULL, &bi->bi_flags);
>   	}
>   }
>   
> @@ -666,7 +698,7 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
>   {
>   	struct gfs2_rgrpd *rgd;
>   
> -	rgd = rs->rs_rbm.rgd;
> +	rgd = rs->rs_rgd;
>   	if (rgd) {
>   		spin_lock(&rgd->rd_rsspin);
>   		__rs_deltree(rs);
> @@ -1481,8 +1513,7 @@ static void rs_insert(struct gfs2_inode *ip)
>   	struct rb_node **newn, *parent = NULL;
>   	int rc;
>   	struct gfs2_blkreserv *rs = &ip->i_res;
> -	struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
> -	u64 fsblock = gfs2_rbm_to_block(&rs->rs_rbm);
> +	struct gfs2_rgrpd *rgd = rs->rs_rgd;
>   
>   	BUG_ON(gfs2_rs_active(rs));
>   
> @@ -1493,7 +1524,7 @@ static void rs_insert(struct gfs2_inode *ip)
>   			rb_entry(*newn, struct gfs2_blkreserv, rs_node);
>   
>   		parent = *newn;
> -		rc = rs_cmp(fsblock, rs->rs_free, cur);
> +		rc = rs_cmp(rs->rs_start, rs->rs_free, cur);
>   		if (rc > 0)
>   			newn = &((*newn)->rb_right);
>   		else if (rc < 0)
> @@ -1581,7 +1612,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
>   
>   	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true);
>   	if (ret == 0) {
> -		rs->rs_rbm = rbm;
> +		rs->rs_start = gfs2_rbm_to_block(&rbm);
>   		rs->rs_free = extlen;
>   		rs_insert(ip);
>   	} else {
> @@ -1626,7 +1657,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
>   
>   	if (n) {
>   		while ((rs_cmp(block, length, rs) == 0) && (&ip->i_res != rs)) {
> -			block = gfs2_rbm_to_block(&rs->rs_rbm) + rs->rs_free;
> +			block = rs->rs_start + rs->rs_free;
>   			n = n->rb_right;
>   			if (n == NULL)
>   				break;
> @@ -1967,7 +1998,7 @@ static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
>   	u64 tdiff;
>   
>   	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
> -                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
> +                            rs->rs_rgd->rd_gl->gl_dstamp));
>   
>   	return tdiff > (msecs * 1000 * 1000);
>   }
> @@ -2045,45 +2076,45 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   	if (gfs2_assert_warn(sdp, ap->target))
>   		return -EINVAL;
>   	if (gfs2_rs_active(rs)) {
> -		begin = rs->rs_rbm.rgd;
> -	} else if (rs->rs_rbm.rgd &&
> -		   rgrp_contains_block(rs->rs_rbm.rgd, ip->i_goal)) {
> -		begin = rs->rs_rbm.rgd;
> +		begin = rs->rs_rgd;
> +	} else if (rs->rs_rgd &&
> +		   rgrp_contains_block(rs->rs_rgd, ip->i_goal)) {
> +		begin = rs->rs_rgd;
>   	} else {
>   		check_and_update_goal(ip);
> -		rs->rs_rbm.rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
> +		rs->rs_rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
>   	}
>   	if (S_ISDIR(ip->i_inode.i_mode) && (ap->aflags & GFS2_AF_ORLOV))
>   		skip = gfs2_orlov_skip(ip);
> -	if (rs->rs_rbm.rgd == NULL)
> +	if (rs->rs_rgd == NULL)
>   		return -EBADSLT;
>   
>   	while (loops < 3) {
>   		rg_locked = 1;
>   
> -		if (!gfs2_glock_is_locked_by_me(rs->rs_rbm.rgd->rd_gl)) {
> +		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
>   			rg_locked = 0;
>   			if (skip && skip--)
>   				goto next_rgrp;
>   			if (!gfs2_rs_active(rs)) {
>   				if (loops == 0 &&
> -				    !fast_to_acquire(rs->rs_rbm.rgd))
> +				    !fast_to_acquire(rs->rs_rgd))
>   					goto next_rgrp;
>   				if ((loops < 2) &&
>   				    gfs2_rgrp_used_recently(rs, 1000) &&
> -				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
> +				    gfs2_rgrp_congested(rs->rs_rgd, loops))
>   					goto next_rgrp;
>   			}
> -			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
> +			error = gfs2_glock_nq_init(rs->rs_rgd->rd_gl,
>   						   LM_ST_EXCLUSIVE, flags,
>   						   &ip->i_rgd_gh);
>   			if (unlikely(error))
>   				return error;
>   			if (!gfs2_rs_active(rs) && (loops < 2) &&
> -			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
> +			    gfs2_rgrp_congested(rs->rs_rgd, loops))
>   				goto skip_rgrp;
>   			if (sdp->sd_args.ar_rgrplvb) {
> -				error = update_rgrp_lvb(rs->rs_rbm.rgd);
> +				error = update_rgrp_lvb(rs->rs_rgd);
>   				if (unlikely(error)) {
>   					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   					return error;
> @@ -2092,24 +2123,24 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   		}
>   
>   		/* Skip unusable resource groups */
> -		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
> +		if ((rs->rs_rgd->rd_flags & (GFS2_RGF_NOALLOC |
>   						 GFS2_RDF_ERROR)) ||
> -		    (loops == 0 && ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
> +		    (loops == 0 && ap->target > rs->rs_rgd->rd_extfail_pt))
>   			goto skip_rgrp;
>   
>   		if (sdp->sd_args.ar_rgrplvb)
> -			gfs2_rgrp_bh_get(rs->rs_rbm.rgd);
> +			gfs2_rgrp_bh_get(rs->rs_rgd);
>   
>   		/* Get a reservation if we don't already have one */
>   		if (!gfs2_rs_active(rs))
> -			rg_mblk_search(rs->rs_rbm.rgd, ip, ap);
> +			rg_mblk_search(rs->rs_rgd, ip, ap);
>   
>   		/* Skip rgrps when we can't get a reservation on first pass */
>   		if (!gfs2_rs_active(rs) && (loops < 1))
>   			goto check_rgrp;
>   
>   		/* If rgrp has enough free space, use it */
> -		free_blocks = rgd_free(rs->rs_rbm.rgd, rs);
> +		free_blocks = rgd_free(rs->rs_rgd, rs);
>   		if (free_blocks >= ap->target ||
>   		    (loops == 2 && ap->min_target &&
>   		     free_blocks >= ap->min_target)) {
> @@ -2118,8 +2149,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   		}
>   check_rgrp:
>   		/* Check for unlinked inodes which can be reclaimed */
> -		if (rs->rs_rbm.rgd->rd_flags & GFS2_RDF_CHECK)
> -			try_rgrp_unlink(rs->rs_rbm.rgd, &last_unlinked,
> +		if (rs->rs_rgd->rd_flags & GFS2_RDF_CHECK)
> +			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
>   					ip->i_no_addr);
>   skip_rgrp:
>   		/* Drop reservation, if we couldn't use reserved rgrp */
> @@ -2131,7 +2162,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
>   			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
>   next_rgrp:
>   		/* Find the next rgrp, and continue looking */
> -		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
> +		if (gfs2_select_rgrp(&rs->rs_rgd, begin))
>   			continue;
>   		if (skip)
>   			continue;
> @@ -2307,20 +2338,21 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
>   {
>   	struct gfs2_blkreserv *rs = &ip->i_res;
>   	struct gfs2_rgrpd *rgd = rbm->rgd;
> -	unsigned rlen;
> -	u64 block;
> -	int ret;
>   
>   	spin_lock(&rgd->rd_rsspin);
>   	if (gfs2_rs_active(rs)) {
> -		if (gfs2_rbm_eq(&rs->rs_rbm, rbm)) {
> -			block = gfs2_rbm_to_block(rbm);
> -			ret = gfs2_rbm_from_block(&rs->rs_rbm, block + len);
> +		u64 start = gfs2_rbm_to_block(rbm);
> +
> +		if (rs->rs_start == start) {
> +			unsigned int rlen;
> +
> +			rs->rs_start += len;
>   			rlen = min(rs->rs_free, len);
>   			rs->rs_free -= rlen;
>   			rgd->rd_reserved -= rlen;
>   			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
> -			if (rs->rs_free && !ret)
> +			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
> +			    rs->rs_free)
>   				goto out;
>   			/* We used up our block reservation, so we should
>   			   reserve more blocks next time. */
> @@ -2349,15 +2381,13 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
>   	u64 goal;
>   
>   	if (gfs2_rs_active(&ip->i_res)) {
> -		*rbm = ip->i_res.rs_rbm;
> -		return;
> +		goal = ip->i_res.rs_start;
> +	} else {
> +		if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
> +			goal = ip->i_goal;
> +		else
> +			goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
>   	}
> -
> -	if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
> -		goal = ip->i_goal;
> -	else
> -		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
> -
>   	BUG_ON(gfs2_rbm_from_block(rbm, goal));
>   }
>   
> @@ -2377,7 +2407,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   	struct buffer_head *dibh;
> -	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rbm.rgd, };
> +	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
>   	unsigned int ndata;
>   	u64 block; /* block, within the file system scope */
>   	int error;
> @@ -2612,7 +2642,7 @@ void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
>   			return;
>   		rgd = gfs2_blk2rgrpd(sdp, block, 1);
>   	} else {
> -		rgd = ip->i_res.rs_rbm.rgd;
> +		rgd = ip->i_res.rs_rgd;
>   		if (!rgd || !rgrp_contains_block(rgd, block))
>   			rgd = gfs2_blk2rgrpd(sdp, block, 1);
>   	}
> diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
> index e0025258107a..7586c7629497 100644
> --- a/fs/gfs2/trace_gfs2.h
> +++ b/fs/gfs2/trace_gfs2.h
> @@ -602,13 +602,13 @@ TRACE_EVENT(gfs2_rs,
>   	),
>   
>   	TP_fast_assign(
> -		__entry->dev		= rs->rs_rbm.rgd->rd_sbd->sd_vfs->s_dev;
> -		__entry->rd_addr	= rs->rs_rbm.rgd->rd_addr;
> -		__entry->rd_free_clone	= rs->rs_rbm.rgd->rd_free_clone;
> -		__entry->rd_reserved	= rs->rs_rbm.rgd->rd_reserved;
> +		__entry->dev		= rs->rs_rgd->rd_sbd->sd_vfs->s_dev;
> +		__entry->rd_addr	= rs->rs_rgd->rd_addr;
> +		__entry->rd_free_clone	= rs->rs_rgd->rd_free_clone;
> +		__entry->rd_reserved	= rs->rs_rgd->rd_reserved;
>   		__entry->inum		= container_of(rs, struct gfs2_inode,
>   						       i_res)->i_no_addr;
> -		__entry->start		= gfs2_rbm_to_block(&rs->rs_rbm);
> +		__entry->start		= rs->rs_start;
>   		__entry->free		= rs->rs_free;
>   		__entry->func		= func;
>   	),
> diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h
> index ad70087d0597..a27078f98e7a 100644
> --- a/fs/gfs2/trans.h
> +++ b/fs/gfs2/trans.h
> @@ -30,7 +30,7 @@ struct gfs2_glock;
>    * block, or all of the blocks in the rg, whichever is smaller */
>   static inline unsigned int gfs2_rg_blocks(const struct gfs2_inode *ip, unsigned requested)
>   {
> -	struct gfs2_rgrpd *rgd = ip->i_res.rs_rbm.rgd;
> +	struct gfs2_rgrpd *rgd = ip->i_res.rs_rgd;
>   
>   	if (requested < rgd->rd_length)
>   		return requested + 1;



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking
  2018-10-08 10:33   ` Steven Whitehouse
@ 2018-10-08 12:56     ` Andreas Gruenbacher
  0 siblings, 0 replies; 19+ messages in thread
From: Andreas Gruenbacher @ 2018-10-08 12:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 8 Oct 2018 at 12:33, Steven Whitehouse <swhiteho@redhat.com> wrote:
> Hi,
>
>
> On 05/10/18 20:18, Andreas Gruenbacher wrote:
> > From: Bob Peterson <rpeterso@redhat.com>
> >
> > Prepare for treating resource group glocks as exclusive among nodes but
> > shared among all tasks running on a node: introduce another layer of
> > node-specific locking that the local tasks can use to coordinate their
> > accesses.
> >
> > This patch only introduces the local locking changes necessary so that
> > future patches can introduce resource group glock sharing.  We replace
> > the resource group spinlock with a mutex; whether that leads to
> > noticeable additional contention on the resource group mutex remains to
> > be seen.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > ---
> >   fs/gfs2/incore.h |  3 +-
> >   fs/gfs2/lops.c   |  5 ++-
> >   fs/gfs2/rgrp.c   | 97 +++++++++++++++++++++++++++++++++++-------------
> >   fs/gfs2/rgrp.h   |  4 ++
> >   4 files changed, 81 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> > index 932e63924f7e..2fa47b476eef 100644
> > --- a/fs/gfs2/incore.h
> > +++ b/fs/gfs2/incore.h
> > @@ -23,6 +23,7 @@
> >   #include <linux/percpu.h>
> >   #include <linux/lockref.h>
> >   #include <linux/rhashtable.h>
> > +#include <linux/mutex.h>
> >
> >   #define DIO_WAIT    0x00000010
> >   #define DIO_METADATA        0x00000020
> > @@ -120,7 +121,7 @@ struct gfs2_rgrpd {
> >   #define GFS2_RDF_ERROR              0x40000000 /* error in rg */
> >   #define GFS2_RDF_PREFERRED  0x80000000 /* This rgrp is preferred */
> >   #define GFS2_RDF_MASK               0xf0000000 /* mask for internal flags */
> > -     spinlock_t rd_rsspin;           /* protects reservation related vars */
> > +     struct mutex rd_lock;
>
> I can see why we might need to have additional local rgrp locking, but
> why do we need to make the sd_rsspin into a mutex? I'm wondering if
> these should not be separate locks still?

Separate locks make sense if we're taking only one of the locks often
enough to offset the overhead. I'm not sure this is the case today,
and if it will remain the case. The future optimization I've described
in patch 08/11 will have an influence on this as well. We'll hopefully
figure out if a separate spin lock for updating the reservation tree
and/or the resource group statistics still makes sense; luckily, this
code is relatively simple.

Thanks,
Andreas

> Steve.
>
> >       struct rb_root rd_rstree;       /* multi-block reservation tree */
> >   };
> >
> > diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> > index 4c7069b8f3c1..a9e858e01c97 100644
> > --- a/fs/gfs2/lops.c
> > +++ b/fs/gfs2/lops.c
> > @@ -76,8 +76,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
> >       unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
> >       struct gfs2_bitmap *bi = rgd->rd_bits + index;
> >
> > +     rgrp_lock_local(rgd);
> >       if (bi->bi_clone == NULL)
> > -             return;
> > +             goto out;
> >       if (sdp->sd_args.ar_discard)
> >               gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
> >       memcpy(bi->bi_clone + bi->bi_offset,
> > @@ -85,6 +86,8 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
> >       clear_bit(GBF_FULL, &bi->bi_flags);
> >       rgd->rd_free_clone = rgd->rd_free;
> >       rgd->rd_extfail_pt = rgd->rd_free;
> > +out:
> > +     rgrp_unlock_local(rgd);
> >   }
> >
> >   /**
> > diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> > index 8a6b41f3667c..a89be4782c15 100644
> > --- a/fs/gfs2/rgrp.c
> > +++ b/fs/gfs2/rgrp.c
> > @@ -702,10 +702,10 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
> >
> >       rgd = rs->rs_rgd;
> >       if (rgd) {
> > -             spin_lock(&rgd->rd_rsspin);
> > +             rgrp_lock_local(rgd);
> >               __rs_deltree(rs);
> >               BUG_ON(rs->rs_free);
> > -             spin_unlock(&rgd->rd_rsspin);
> > +             rgrp_unlock_local(rgd);
> >       }
> >   }
> >
> > @@ -737,12 +737,12 @@ static void return_all_reservations(struct gfs2_rgrpd *rgd)
> >       struct rb_node *n;
> >       struct gfs2_blkreserv *rs;
> >
> > -     spin_lock(&rgd->rd_rsspin);
> > +     rgrp_lock_local(rgd);
> >       while ((n = rb_first(&rgd->rd_rstree))) {
> >               rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
> >               __rs_deltree(rs);
> >       }
> > -     spin_unlock(&rgd->rd_rsspin);
> > +     rgrp_unlock_local(rgd);
> >   }
> >
> >   void gfs2_clear_rgrpd(struct gfs2_sbd *sdp)
> > @@ -948,7 +948,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
> >       rgd->rd_data0 = be64_to_cpu(buf.ri_data0);
> >       rgd->rd_data = be32_to_cpu(buf.ri_data);
> >       rgd->rd_bitbytes = be32_to_cpu(buf.ri_bitbytes);
> > -     spin_lock_init(&rgd->rd_rsspin);
> > +     mutex_init(&rgd->rd_lock);
> >
> >       error = compute_bitstructs(rgd);
> >       if (error)
> > @@ -1469,9 +1469,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
> >                       /* Trim each bitmap in the rgrp */
> >                       for (x = 0; x < rgd->rd_length; x++) {
> >                               struct gfs2_bitmap *bi = rgd->rd_bits + x;
> > +                             rgrp_lock_local(rgd);
> >                               ret = gfs2_rgrp_send_discards(sdp,
> >                                               rgd->rd_data0, NULL, bi, minlen,
> >                                               &amt);
> > +                             rgrp_unlock_local(rgd);
> >                               if (ret) {
> >                                       gfs2_glock_dq_uninit(&gh);
> >                                       goto out;
> > @@ -1483,9 +1485,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
> >                       ret = gfs2_trans_begin(sdp, RES_RG_HDR, 0);
> >                       if (ret == 0) {
> >                               bh = rgd->rd_bits[0].bi_bh;
> > +                             rgrp_lock_local(rgd);
> >                               rgd->rd_flags |= GFS2_RGF_TRIMMED;
> >                               gfs2_trans_add_meta(rgd->rd_gl, bh);
> >                               gfs2_rgrp_out(rgd, bh->b_data);
> > +                             rgrp_unlock_local(rgd);
> >                               gfs2_trans_end(sdp);
> >                       }
> >               }
> > @@ -1519,7 +1523,6 @@ static void rs_insert(struct gfs2_inode *ip)
> >
> >       BUG_ON(gfs2_rs_active(rs));
> >
> > -     spin_lock(&rgd->rd_rsspin);
> >       newn = &rgd->rd_rstree.rb_node;
> >       while (*newn) {
> >               struct gfs2_blkreserv *cur =
> > @@ -1532,7 +1535,6 @@ static void rs_insert(struct gfs2_inode *ip)
> >               else if (rc < 0)
> >                       newn = &((*newn)->rb_left);
> >               else {
> > -                     spin_unlock(&rgd->rd_rsspin);
> >                       WARN_ON(1);
> >                       return;
> >               }
> > @@ -1540,7 +1542,6 @@ static void rs_insert(struct gfs2_inode *ip)
> >
> >       rb_link_node(&rs->rs_node, parent, newn);
> >       rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
> > -     spin_unlock(&rgd->rd_rsspin);
> >       trace_gfs2_rs(rs, TRACE_RS_INSERT);
> >   }
> >
> > @@ -1632,7 +1633,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
> >       struct rb_node *n;
> >       int rc;
> >
> > -     spin_lock(&rgd->rd_rsspin);
> >       n = rgd->rd_rstree.rb_node;
> >       while (n) {
> >               rs = rb_entry(n, struct gfs2_blkreserv, rs_node);
> > @@ -1655,7 +1655,6 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
> >               }
> >       }
> >
> > -     spin_unlock(&rgd->rd_rsspin);
> >       return block;
> >   }
> >
> > @@ -1857,7 +1856,22 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
> >       struct gfs2_rbm rbm = { .rgd = rgd, .bii = 0, .offset = 0 };
> >
> >       while (1) {
> > +             /*
> > +              * We must be careful to avoid deadlock here:
> > +              * All transactions expect: sd_log_flush_lock followed by rgrp
> > +              * ex (if needed), but try_rgrp_unlink takes sd_log_flush_lock
> > +              * outside a transaction and therefore must not have the rgrp
> > +              * ex already held. To avoid deadlock, we drop the rgrp ex lock
> > +              * before taking the log_flush_lock, then reacquire it to
> > +              * protect our call to gfs2_rbm_find.
> > +              *
> > +              * Also note that rgrp_unlock_local must come AFTER the caller does
> > +              * gfs2_rs_deltree because rgrp ex needs to be held before
> > +              * making reservations.
> > +              */
> > +             rgrp_unlock_local(rgd);
> >               down_write(&sdp->sd_log_flush_lock);
> > +             rgrp_lock_local(rgd);
> >               error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, NULL, NULL,
> >                                     true);
> >               up_write(&sdp->sd_log_flush_lock);
> > @@ -2055,7 +2069,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
> >       struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> >       struct gfs2_rgrpd *begin = NULL;
> >       struct gfs2_blkreserv *rs = &ip->i_res;
> > -     int error = 0, rg_locked, flags = 0;
> > +     int error = 0, flags = 0;
> > +     bool rg_locked;
> >       u64 last_unlinked = NO_BLOCK;
> >       int loops = 0;
> >       u32 free_blocks, skip = 0;
> > @@ -2081,10 +2096,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
> >               return -EBADSLT;
> >
> >       while (loops < 3) {
> > -             rg_locked = 1;
> > -
> > -             if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
> > -                     rg_locked = 0;
> > +             rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
> > +             if (rg_locked) {
> > +                     rgrp_lock_local(rs->rs_rgd);
> > +             } else {
> >                       if (skip && skip--)
> >                               goto next_rgrp;
> >                       if (!gfs2_rs_active(rs)) {
> > @@ -2101,12 +2116,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
> >                                                  &ip->i_rgd_gh);
> >                       if (unlikely(error))
> >                               return error;
> > +                     rgrp_lock_local(rs->rs_rgd);
> >                       if (!gfs2_rs_active(rs) && (loops < 2) &&
> >                           gfs2_rgrp_congested(rs->rs_rgd, loops))
> >                               goto skip_rgrp;
> >                       if (sdp->sd_args.ar_rgrplvb) {
> >                               error = update_rgrp_lvb(rs->rs_rgd);
> >                               if (unlikely(error)) {
> > +                                     rgrp_unlock_local(rs->rs_rgd);
> >                                       gfs2_glock_dq_uninit(&ip->i_rgd_gh);
> >                                       return error;
> >                               }
> > @@ -2140,9 +2157,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
> >                       rs->rs_reserved = ap->target;
> >                       if (rs->rs_reserved > free_blocks)
> >                               rs->rs_reserved = free_blocks;
> > -                     spin_lock(&rs->rs_rgd->rd_rsspin);
> >                       rgd->rd_reserved += rs->rs_reserved;
> > -                     spin_unlock(&rs->rs_rgd->rd_rsspin);
> > +                     rgrp_unlock_local(rs->rs_rgd);
> >                       return 0;
> >               }
> >   check_rgrp:
> > @@ -2151,6 +2167,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
> >                       try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
> >                                       ip->i_no_addr);
> >   skip_rgrp:
> > +             rgrp_unlock_local(rs->rs_rgd);
> > +
> >               /* Drop reservation, if we couldn't use reserved rgrp */
> >               if (gfs2_rs_active(rs))
> >                       gfs2_rs_deltree(rs);
> > @@ -2199,10 +2217,10 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
> >       if (rs->rs_reserved) {
> >               struct gfs2_rgrpd *rgd = rs->rs_rgd;
> >
> > -             spin_lock(&rgd->rd_rsspin);
> > +             rgrp_lock_local(rgd);
> >               BUG_ON(rgd->rd_reserved < rs->rs_reserved);
> >               rgd->rd_reserved -= rs->rs_reserved;
> > -             spin_unlock(&rs->rs_rgd->rd_rsspin);
> > +             rgrp_unlock_local(rgd);
> >               rs->rs_reserved = 0;
> >       }
> >       if (gfs2_holder_initialized(&ip->i_rgd_gh))
> > @@ -2304,12 +2322,12 @@ void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
> >                              be32_to_cpu(rgl->rl_free),
> >                              be32_to_cpu(rgl->rl_dinodes));
> >       }
> > -     spin_lock(&rgd->rd_rsspin);
> > +     rgrp_lock_local(rgd);
> >       for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
> >               trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
> >               dump_rs(seq, trs);
> >       }
> > -     spin_unlock(&rgd->rd_rsspin);
> > +     rgrp_unlock_local(rgd);
> >   }
> >
> >   static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
> > @@ -2339,7 +2357,6 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
> >       struct gfs2_blkreserv *rs = &ip->i_res;
> >       struct gfs2_rgrpd *rgd = rbm->rgd;
> >
> > -     spin_lock(&rgd->rd_rsspin);
> >       BUG_ON(rs->rs_reserved < len);
> >       rgd->rd_reserved -= len;
> >       rs->rs_reserved -= len;
> > @@ -2355,15 +2372,13 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
> >                       trace_gfs2_rs(rs, TRACE_RS_CLAIM);
> >                       if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
> >                           rs->rs_free)
> > -                             goto out;
> > +                             return;
> >                       /* We used up our block reservation, so we should
> >                          reserve more blocks next time. */
> >                       atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
> >               }
> >               __rs_deltree(rs);
> >       }
> > -out:
> > -     spin_unlock(&rgd->rd_rsspin);
> >   }
> >
> >   /**
> > @@ -2416,6 +2431,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
> >
> >       BUG_ON(ip->i_res.rs_reserved < *nblocks);
> >
> > +     rgrp_lock_local(rbm.rgd);
> >       gfs2_set_alloc_start(&rbm, ip, dinode);
> >       error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
> >
> > @@ -2460,6 +2476,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
> >       }
> >
> >       rbm.rgd->rd_free -= *nblocks;
> > +     rbm.rgd->rd_free_clone -= *nblocks;
> >       if (dinode) {
> >               rbm.rgd->rd_dinodes++;
> >               *generation = rbm.rgd->rd_igeneration++;
> > @@ -2469,6 +2486,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
> >
> >       gfs2_trans_add_meta(rbm.rgd->rd_gl, rbm.rgd->rd_bits[0].bi_bh);
> >       gfs2_rgrp_out(rbm.rgd, rbm.rgd->rd_bits[0].bi_bh->b_data);
> > +     rgrp_unlock_local(rbm.rgd);
> >
> >       gfs2_statfs_change(sdp, 0, -(s64)*nblocks, dinode ? 1 : 0);
> >       if (dinode)
> > @@ -2476,13 +2494,13 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
> >
> >       gfs2_quota_change(ip, *nblocks, ip->i_inode.i_uid, ip->i_inode.i_gid);
> >
> > -     rbm.rgd->rd_free_clone -= *nblocks;
> >       trace_gfs2_block_alloc(ip, rbm.rgd, block, *nblocks,
> >                              dinode ? GFS2_BLKST_DINODE : GFS2_BLKST_USED);
> >       *bn = block;
> >       return 0;
> >
> >   rgrp_error:
> > +     rgrp_unlock_local(rbm.rgd);
> >       gfs2_rgrp_error(rbm.rgd);
> >       return -EIO;
> >   }
> > @@ -2502,12 +2520,14 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
> >   {
> >       struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> >
> > +     rgrp_lock_local(rgd);
> >       rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
> >       trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
> >       rgd->rd_free += blen;
> >       rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
> >       gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
> >       gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> > +     rgrp_unlock_local(rgd);
> >
> >       /* Directories keep their data in the metadata address space */
> >       if (meta || ip->i_depth)
> > @@ -2543,17 +2563,20 @@ void gfs2_unlink_di(struct inode *inode)
> >       rgd = gfs2_blk2rgrpd(sdp, blkno, true);
> >       if (!rgd)
> >               return;
> > +     rgrp_lock_local(rgd);
> >       rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
> >       trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
> >       gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
> >       gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> >       be32_add_cpu(&rgd->rd_rgl->rl_unlinked, 1);
> > +     rgrp_unlock_local(rgd);
> >   }
> >
> >   void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
> >   {
> >       struct gfs2_sbd *sdp = rgd->rd_sbd;
> >
> > +     rgrp_lock_local(rgd);
> >       rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
> >       if (!rgd->rd_dinodes)
> >               gfs2_consist_rgrpd(rgd);
> > @@ -2562,6 +2585,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
> >
> >       gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
> >       gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
> > +     rgrp_unlock_local(rgd);
> >       be32_add_cpu(&rgd->rd_rgl->rl_unlinked, -1);
> >
> >       gfs2_statfs_change(sdp, 0, +1, -1);
> > @@ -2576,6 +2600,10 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
> >    * @no_addr: The block number to check
> >    * @type: The block type we are looking for
> >    *
> > + * The inode glock of @no_addr must be held.  The @type to check for is either
> > + * GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; checking for type GFS2_BLKST_FREE
> > + * or GFS2_BLKST_USED would make no sense.
> > + *
> >    * Returns: 0 if the block type matches the expected type
> >    *          -ESTALE if it doesn't match
> >    *          or -ve errno if something went wrong while checking
> > @@ -2601,6 +2629,12 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
> >       if (WARN_ON_ONCE(error))
> >               goto fail;
> >
> > +     /*
> > +      * No need to take the local resource group lock here; the inode glock
> > +      * of @no_addr provides the necessary synchronization in case the block
> > +      * is an inode.  (In case the block is not an inode, the block type
> > +      * will not match the @type we are looking for.)
> > +      */
> >       if (gfs2_testbit(&rbm, false) != type)
> >               error = -ESTALE;
> >
> > @@ -2723,3 +2757,14 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
> >       }
> >   }
> >
> > +void rgrp_lock_local(struct gfs2_rgrpd *rgd)
> > +{
> > +     BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
> > +            !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
> > +     mutex_lock(&rgd->rd_lock);
> > +}
> > +
> > +void rgrp_unlock_local(struct gfs2_rgrpd *rgd)
> > +{
> > +     mutex_unlock(&rgd->rd_lock);
> > +}
> > diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
> > index b596c3d17988..33e52dab76ef 100644
> > --- a/fs/gfs2/rgrp.h
> > +++ b/fs/gfs2/rgrp.h
> > @@ -92,4 +92,8 @@ static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block)
> >   }
> >
> >   extern void check_and_update_goal(struct gfs2_inode *ip);
> > +
> > +extern void rgrp_lock_local(struct gfs2_rgrpd *rgd);
> > +extern void rgrp_unlock_local(struct gfs2_rgrpd *rgd);
> > +
> >   #endif /* __RGRP_DOT_H__ */
>



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-08 12:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
2018-10-08 10:12   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode Andreas Gruenbacher
2018-10-08 10:34   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 03/11] gfs2: Remove unused RGRP_RSRV_MINBYTES definition Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 04/11] gfs2: Rename bitmap.bi_{len => bytes} Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 05/11] gfs2: Fix some minor typos Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
2018-10-08 10:39   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full Andreas Gruenbacher
2018-10-08 10:23   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 08/11] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 09/11] gfs2: Remove unnecessary gfs2_rlist_alloc parameter Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free Andreas Gruenbacher
2018-10-08 10:28   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking Andreas Gruenbacher
2018-10-08 10:33   ` Steven Whitehouse
2018-10-08 12:56     ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.