[Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing
@ 2020-10-14  9:58 Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 01/17] gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add Andreas Gruenbacher
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here's an update of my resource group glock sharing patches.

The key patch of this patch queue is "gfs2: Add per-reservation reserved
block accounting": with resource group glock sharing, multiple
allocations can be going on in parallel.  This requires tracking of the
actual space reserved for each allocation to prevent the allocator from
handing out more blocks than available.

Before this commit, the filesystem would report to be out of space when
the number of tentatively reserved blocks would exceed the free space
available.  In the most extreme workload that only creates empty inodes,
that would be the case when only 1/32 of the filesystem is full.

With the above mentioned commit, the filesystem will continue to
allocate blocks in such situations, but only at a very slow rate because
the tentatively reserved blocks will still get in the way.  Commit
"gfs2: Limit the maximum amount of reserved space" changes that to
shrink the amount of tentatively reserved space to no more than half the
free space when no more blocks can be reserved.  

Thanks,
Andreas

Andreas Gruenbacher (14):
  gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add
  gfs2: Only use struct gfs2_rbm for bitmap manipulations
  gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks
  gfs2: Minor gfs2_inplace_reserve cleanup
  Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks
    reserved"
  gfs2: Don't clear GBF_FULL flags in rs_deltree
  gfs2: Set GBF_FULL flags when reading resource groups
  gfs2: When looking for blocks to allocate, don't clamp at free blocks
  gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
  gfs2: Only pass reservation down to gfs2_rbm_find
  gfs: Don't search for unreserved space twice
  gfs2: Add per-reservation reserved block accounting
  gfs2: Limit the maximum amount of reserved space
  gfs2: Rename rs_free to rs_wanted

Bob Peterson (3):
  gfs2: Add local resource group locking
  gfs2: Allow node-wide exclusive glock sharing
  gfs2: Use resource group glock sharing

 fs/gfs2/bmap.c       |   4 +-
 fs/gfs2/file.c       |   4 +-
 fs/gfs2/glock.c      |  22 +-
 fs/gfs2/glock.h      |   6 +
 fs/gfs2/incore.h     |  36 +---
 fs/gfs2/inode.c      |   6 +-
 fs/gfs2/lops.c       |   7 +-
 fs/gfs2/rgrp.c       | 489 ++++++++++++++++++++++++++-----------------
 fs/gfs2/rgrp.h       |   4 +
 fs/gfs2/super.c      |   3 +-
 fs/gfs2/trace_gfs2.h |  33 ++-
 fs/gfs2/trans.h      |   2 +-
 fs/gfs2/xattr.c      |   6 +-
 13 files changed, 376 insertions(+), 246 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 01/17] gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 02/17] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Change gfs2_rbm_incr to advance an rbm by a given number of blocks.  Use that
in gfs2_reservation_check_and_update to save a gfs2_rbm_to_block ->
gfs2_rbm_from_block round trip.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 55 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 074f228ea839..073a9c9cb782 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -277,29 +277,38 @@ static int gfs2_rbm_from_block(struct gfs2_rbm *rbm, u64 block)
 }
 
 /**
- * gfs2_rbm_incr - increment an rbm structure
+ * gfs2_rbm_add - add a number of blocks to an rbm
  * @rbm: The rbm with rgd already set correctly
+ * @blocks: The number of blocks to add to rpm
  *
- * This function takes an existing rbm structure and increments it to the next
- * viable block offset.
- *
- * Returns: If incrementing the offset would cause the rbm to go past the
- *          end of the rgrp, true is returned, otherwise false.
+ * This function takes an existing rbm structure and adds a number of blocks to
+ * it.
  *
+ * Returns: True if the new rbm would point past the end of the rgrp.
  */
 
-static bool gfs2_rbm_incr(struct gfs2_rbm *rbm)
+static bool gfs2_rbm_add(struct gfs2_rbm *rbm, u32 blocks)
 {
-	if (rbm->offset + 1 < rbm_bi(rbm)->bi_blocks) { /* in the same bitmap */
-		rbm->offset++;
+	struct gfs2_rgrpd *rgd = rbm->rgd;
+	struct gfs2_bitmap *bi = rgd->rd_bits + rbm->bii;
+
+	if (rbm->offset + blocks < bi->bi_blocks) {
+		rbm->offset += blocks;
 		return false;
 	}
-	if (rbm->bii == rbm->rgd->rd_length - 1) /* at the last bitmap */
-		return true;
+	blocks -= bi->bi_blocks - rbm->offset;
 
-	rbm->offset = 0;
-	rbm->bii++;
-	return false;
+	for(;;) {
+		bi++;
+		if (bi == rgd->rd_bits + rgd->rd_length)
+			return true;
+		if (blocks < bi->bi_blocks) {
+			rbm->offset = blocks;
+			rbm->bii = bi - rgd->rd_bits;
+			return false;
+		}
+		blocks -= bi->bi_blocks;
+	}
 }
 
 /**
@@ -323,7 +332,7 @@ static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *le
 		(*len)--;
 		if (*len == 0)
 			return true;
-		if (gfs2_rbm_incr(rbm))
+		if (gfs2_rbm_add(rbm, 1))
 			return true;
 	}
 
@@ -1636,7 +1645,6 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 	u64 block = gfs2_rbm_to_block(rbm);
 	u32 extlen = 1;
 	u64 nblock;
-	int ret;
 
 	/*
 	 * If we have a minimum extent length, then skip over any extent
@@ -1661,12 +1669,15 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 			maxext->len = extlen;
 			maxext->rbm = *rbm;
 		}
-fail:
-		nblock = block + extlen;
+	} else {
+		u64 len = nblock - block;
+		if (len >= (u64)1 << 32)
+			return -E2BIG;
+		extlen = len;
 	}
-	ret = gfs2_rbm_from_block(rbm, nblock);
-	if (ret < 0)
-		return ret;
+fail:
+	if (gfs2_rbm_add(rbm, extlen))
+		return -E2BIG;
 	return 1;
 }
 
@@ -2202,7 +2213,7 @@ static void rgblk_free(struct gfs2_sbd *sdp, struct gfs2_rgrpd *rgd,
 			bi_prev = bi;
 		}
 		gfs2_setbit(&rbm, false, new_state);
-		gfs2_rbm_incr(&rbm);
+		gfs2_rbm_add(&rbm, 1);
 	}
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 02/17] gfs2: Only use struct gfs2_rbm for bitmap manipulations
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 01/17] gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 03/17] gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks Andreas Gruenbacher
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
bit position within a resource group.  This representation is used in
the bitmap manipulation code to prevent excessive conversions between
block numbers and bit positions, but also in struct gfs2_blkreserv which
is part of struct gfs2_inode, to mark the start of a reservation.  In
the inode, the bit position representation makes less sense: first, the
start position is used as a block number about as often as a bit
position; second, the bit position representation makes the code
unnecessarily complicated and difficult to read.

Therefore, change struct gfs2_blkreserv to represent the start of a
reservation as a block number instead of a bit position.  (This requires
keeping track of the resource group in gfs2_blkreserv separately.) With
that change, various things can be slightly simplified, and struct
gfs2_rbm can be moved to rgrp.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c       |   2 +-
 fs/gfs2/incore.h     |  30 +--------
 fs/gfs2/rgrp.c       | 154 +++++++++++++++++++++++++------------------
 fs/gfs2/trace_gfs2.h |  10 +--
 fs/gfs2/trans.h      |   2 +-
 5 files changed, 101 insertions(+), 97 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0f69fbd4af66..ff64fa85d3ac 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1547,7 +1547,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 
 			/* Must be done with the rgrp glock held: */
 			if (gfs2_rs_active(&ip->i_res) &&
-			    rgd == ip->i_res.rs_rbm.rgd)
+			    rgd == ip->i_res.rs_rgd)
 				gfs2_rs_deltree(&ip->i_res);
 		}
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index ca2ec02436ec..fb487e90eac7 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -121,31 +121,6 @@ struct gfs2_rgrpd {
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
-struct gfs2_rbm {
-	struct gfs2_rgrpd *rgd;
-	u32 offset;		/* The offset is bitmap relative */
-	int bii;		/* Bitmap index */
-};
-
-static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
-{
-	return rbm->rgd->rd_bits + rbm->bii;
-}
-
-static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
-{
-	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
-	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
-		rbm->offset;
-}
-
-static inline bool gfs2_rbm_eq(const struct gfs2_rbm *rbm1,
-			       const struct gfs2_rbm *rbm2)
-{
-	return (rbm1->rgd == rbm2->rgd) && (rbm1->bii == rbm2->bii) &&
-	       (rbm1->offset == rbm2->offset);
-}
-
 enum gfs2_state_bits {
 	BH_Pinned = BH_PrivateStart,
 	BH_Escaped = BH_PrivateStart + 1,
@@ -308,8 +283,9 @@ struct gfs2_qadata { /* quota allocation data */
 */
 
 struct gfs2_blkreserv {
-	struct rb_node rs_node;       /* link to other block reservations */
-	struct gfs2_rbm rs_rbm;       /* Start of reservation */
+	struct rb_node rs_node;       /* node within rd_rstree */
+	struct gfs2_rgrpd *rs_rgd;
+	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
 };
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 073a9c9cb782..f044430f5c52 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -36,6 +36,24 @@
 #define BFITNOENT ((u32)~0)
 #define NO_BLOCK ((u64)~0)
 
+struct gfs2_rbm {
+	struct gfs2_rgrpd *rgd;
+	u32 offset;		/* The offset is bitmap relative */
+	int bii;		/* Bitmap index */
+};
+
+static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
+{
+	return rbm->rgd->rd_bits + rbm->bii;
+}
+
+static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
+{
+	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
+	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
+		rbm->offset;
+}
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -175,7 +193,7 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
 
 /**
  * rs_cmp - multi-block reservation range compare
- * @blk: absolute file system block number of the new reservation
+ * @start: start of the new reservation
  * @len: number of blocks in the new reservation
  * @rs: existing reservation to compare against
  *
@@ -183,13 +201,11 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
  *         -1 if the block range is before the start of the reservation
  *          0 if the block range overlaps with the reservation
  */
-static inline int rs_cmp(u64 blk, u32 len, struct gfs2_blkreserv *rs)
+static inline int rs_cmp(u64 start, u32 len, struct gfs2_blkreserv *rs)
 {
-	u64 startblk = gfs2_rbm_to_block(&rs->rs_rbm);
-
-	if (blk >= startblk + rs->rs_free)
+	if (start >= rs->rs_start + rs->rs_free)
 		return 1;
-	if (blk + len - 1 < startblk)
+	if (rs->rs_start >= start + len)
 		return -1;
 	return 0;
 }
@@ -311,13 +327,28 @@ static bool gfs2_rbm_add(struct gfs2_rbm *rbm, u32 blocks)
 	}
 }
 
+static struct gfs2_bitmap *gfs2_block_to_bitmap(struct gfs2_rgrpd *rgd,
+						u64 block)
+{
+	unsigned int delta = (sizeof(struct gfs2_rgrp) -
+			      sizeof(struct gfs2_meta_header)) * GFS2_NBBY;
+	unsigned int rblock, bii;
+
+	if (!rgrp_contains_block(rgd, block))
+		return NULL;
+	rblock = block - rgd->rd_data0;
+	bii = (rblock + delta) / rgd->rd_sbd->sd_blocks_per_bitmap;
+	return rgd->rd_bits + bii;
+}
+
 /**
  * gfs2_unaligned_extlen - Look for free blocks which are not byte aligned
  * @rbm: Position to search (value/result)
  * @n_unaligned: Number of unaligned blocks to check
  * @len: Decremented for each block found (terminate on zero)
  *
- * Returns: true if a non-free block is encountered
+ * Returns: true if a non-free block is encountered or the end of the resource
+ *	    group is reached.
  */
 
 static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *len)
@@ -604,10 +635,11 @@ static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
 {
 	struct gfs2_inode *ip = container_of(rs, struct gfs2_inode, i_res);
 
-	gfs2_print_dbg(seq, "%s  B: n:%llu s:%llu b:%u f:%u\n", fs_id_buf,
+	gfs2_print_dbg(seq, "%s  B: n:%llu s:%llu f:%u\n",
+		       fs_id_buf,
 		       (unsigned long long)ip->i_no_addr,
-		       (unsigned long long)gfs2_rbm_to_block(&rs->rs_rbm),
-		       rs->rs_rbm.offset, rs->rs_free);
+		       (unsigned long long)rs->rs_start,
+		       rs->rs_free);
 }
 
 /**
@@ -622,30 +654,28 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	if (!gfs2_rs_active(rs))
 		return;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	trace_gfs2_rs(rs, TRACE_RS_TREEDEL);
 	rb_erase(&rs->rs_node, &rgd->rd_rstree);
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		u64 last_block = gfs2_rbm_to_block(&rs->rs_rbm) +
-				 rs->rs_free - 1;
-		struct gfs2_rbm last_rbm = { .rgd = rs->rs_rbm.rgd, };
+		u64 last_block = rs->rs_start + rs->rs_free - 1;
 		struct gfs2_bitmap *start, *last;
 
 		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free);
-		rs->rs_rbm.rgd->rd_reserved -= rs->rs_free;
+		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
+		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
 		rgd->rd_extfail_pt += rs->rs_free;
 		rs->rs_free = 0;
-		if (gfs2_rbm_from_block(&last_rbm, last_block))
+		start = gfs2_block_to_bitmap(rgd, rs->rs_start);
+		last = gfs2_block_to_bitmap(rgd, last_block);
+		if (!start || !last)
 			return;
-		start = rbm_bi(&rs->rs_rbm);
-		last = rbm_bi(&last_rbm);
 		do
 			clear_bit(GBF_FULL, &start->bi_flags);
 		while (start++ != last);
@@ -661,7 +691,7 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
 {
 	struct gfs2_rgrpd *rgd;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	if (rgd) {
 		spin_lock(&rgd->rd_rsspin);
 		__rs_deltree(rs);
@@ -1464,8 +1494,7 @@ static void rs_insert(struct gfs2_inode *ip)
 	struct rb_node **newn, *parent = NULL;
 	int rc;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
-	u64 fsblock = gfs2_rbm_to_block(&rs->rs_rbm);
+	struct gfs2_rgrpd *rgd = rs->rs_rgd;
 
 	BUG_ON(gfs2_rs_active(rs));
 
@@ -1476,7 +1505,7 @@ static void rs_insert(struct gfs2_inode *ip)
 			rb_entry(*newn, struct gfs2_blkreserv, rs_node);
 
 		parent = *newn;
-		rc = rs_cmp(fsblock, rs->rs_free, cur);
+		rc = rs_cmp(rs->rs_start, rs->rs_free, cur);
 		if (rc > 0)
 			newn = &((*newn)->rb_right);
 		else if (rc < 0)
@@ -1564,7 +1593,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 
 	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true);
 	if (ret == 0) {
-		rs->rs_rbm = rbm;
+		rs->rs_start = gfs2_rbm_to_block(&rbm);
 		rs->rs_free = extlen;
 		rs_insert(ip);
 	} else {
@@ -1609,7 +1638,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 
 	if (n) {
 		while ((rs_cmp(block, length, rs) == 0) && (&ip->i_res != rs)) {
-			block = gfs2_rbm_to_block(&rs->rs_rbm) + rs->rs_free;
+			block = rs->rs_start + rs->rs_free;
 			n = n->rb_right;
 			if (n == NULL)
 				break;
@@ -1946,7 +1975,7 @@ static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
 	u64 tdiff;
 
 	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
-                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
+                            rs->rs_rgd->rd_gl->gl_dstamp));
 
 	return tdiff > (msecs * 1000 * 1000);
 }
@@ -2024,45 +2053,45 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	if (gfs2_assert_warn(sdp, ap->target))
 		return -EINVAL;
 	if (gfs2_rs_active(rs)) {
-		begin = rs->rs_rbm.rgd;
-	} else if (rs->rs_rbm.rgd &&
-		   rgrp_contains_block(rs->rs_rbm.rgd, ip->i_goal)) {
-		begin = rs->rs_rbm.rgd;
+		begin = rs->rs_rgd;
+	} else if (rs->rs_rgd &&
+		   rgrp_contains_block(rs->rs_rgd, ip->i_goal)) {
+		begin = rs->rs_rgd;
 	} else {
 		check_and_update_goal(ip);
-		rs->rs_rbm.rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
+		rs->rs_rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
 	}
 	if (S_ISDIR(ip->i_inode.i_mode) && (ap->aflags & GFS2_AF_ORLOV))
 		skip = gfs2_orlov_skip(ip);
-	if (rs->rs_rbm.rgd == NULL)
+	if (rs->rs_rgd == NULL)
 		return -EBADSLT;
 
 	while (loops < 3) {
 		rg_locked = 1;
 
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rbm.rgd->rd_gl)) {
+		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
 				if (loops == 0 &&
-				    !fast_to_acquire(rs->rs_rbm.rgd))
+				    !fast_to_acquire(rs->rs_rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
 				    gfs2_rgrp_used_recently(rs, 1000) &&
-				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+				    gfs2_rgrp_congested(rs->rs_rgd, loops))
 					goto next_rgrp;
 			}
-			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
+			error = gfs2_glock_nq_init(rs->rs_rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
-			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
-				error = update_rgrp_lvb(rs->rs_rbm.rgd);
+				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
@@ -2071,24 +2100,24 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 
 		/* Skip unusable resource groups */
-		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
+		if ((rs->rs_rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
-		    (loops == 0 && ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
+		    (loops == 0 && ap->target > rs->rs_rgd->rd_extfail_pt))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
-			gfs2_rgrp_bh_get(rs->rs_rbm.rgd);
+			gfs2_rgrp_bh_get(rs->rs_rgd);
 
 		/* Get a reservation if we don't already have one */
 		if (!gfs2_rs_active(rs))
-			rg_mblk_search(rs->rs_rbm.rgd, ip, ap);
+			rg_mblk_search(rs->rs_rgd, ip, ap);
 
 		/* Skip rgrps when we can't get a reservation on first pass */
 		if (!gfs2_rs_active(rs) && (loops < 1))
 			goto check_rgrp;
 
 		/* If rgrp has enough free space, use it */
-		free_blocks = rgd_free(rs->rs_rbm.rgd, rs);
+		free_blocks = rgd_free(rs->rs_rgd, rs);
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
@@ -2097,8 +2126,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
-		if (rs->rs_rbm.rgd->rd_flags & GFS2_RDF_CHECK)
-			try_rgrp_unlink(rs->rs_rbm.rgd, &last_unlinked,
+		if (rs->rs_rgd->rd_flags & GFS2_RDF_CHECK)
+			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
 		/* Drop reservation, if we couldn't use reserved rgrp */
@@ -2110,7 +2139,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 next_rgrp:
 		/* Find the next rgrp, and continue looking */
-		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
+		if (gfs2_select_rgrp(&rs->rs_rgd, begin))
 			continue;
 		if (skip)
 			continue;
@@ -2284,20 +2313,21 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 {
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
-	unsigned rlen;
-	u64 block;
-	int ret;
 
 	spin_lock(&rgd->rd_rsspin);
 	if (gfs2_rs_active(rs)) {
-		if (gfs2_rbm_eq(&rs->rs_rbm, rbm)) {
-			block = gfs2_rbm_to_block(rbm);
-			ret = gfs2_rbm_from_block(&rs->rs_rbm, block + len);
+		u64 start = gfs2_rbm_to_block(rbm);
+
+		if (rs->rs_start == start) {
+			unsigned int rlen;
+
+			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
 			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
-			if (rs->rs_free && !ret)
+			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
+			    rs->rs_free)
 				goto out;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
@@ -2326,15 +2356,13 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
 	u64 goal;
 
 	if (gfs2_rs_active(&ip->i_res)) {
-		*rbm = ip->i_res.rs_rbm;
-		return;
+		goal = ip->i_res.rs_start;
+	} else {
+		if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
+			goal = ip->i_goal;
+		else
+			goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
 	}
-
-	if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
-		goal = ip->i_goal;
-	else
-		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
-
 	if (WARN_ON_ONCE(gfs2_rbm_from_block(rbm, goal))) {
 		rbm->bii = 0;
 		rbm->offset = 0;
@@ -2357,7 +2385,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct buffer_head *dibh;
-	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rbm.rgd, };
+	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
 	unsigned int ndata;
 	u64 block; /* block, within the file system scope */
 	int error;
@@ -2589,7 +2617,7 @@ void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 			return;
 		rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	} else {
-		rgd = ip->i_res.rs_rbm.rgd;
+		rgd = ip->i_res.rs_rgd;
 		if (!rgd || !rgrp_contains_block(rgd, block))
 			rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	}
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index e0025258107a..7586c7629497 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -602,13 +602,13 @@ TRACE_EVENT(gfs2_rs,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= rs->rs_rbm.rgd->rd_sbd->sd_vfs->s_dev;
-		__entry->rd_addr	= rs->rs_rbm.rgd->rd_addr;
-		__entry->rd_free_clone	= rs->rs_rbm.rgd->rd_free_clone;
-		__entry->rd_reserved	= rs->rs_rbm.rgd->rd_reserved;
+		__entry->dev		= rs->rs_rgd->rd_sbd->sd_vfs->s_dev;
+		__entry->rd_addr	= rs->rs_rgd->rd_addr;
+		__entry->rd_free_clone	= rs->rs_rgd->rd_free_clone;
+		__entry->rd_reserved	= rs->rs_rgd->rd_reserved;
 		__entry->inum		= container_of(rs, struct gfs2_inode,
 						       i_res)->i_no_addr;
-		__entry->start		= gfs2_rbm_to_block(&rs->rs_rbm);
+		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
 		__entry->func		= func;
 	),
diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h
index 83199ce5a5c5..b812c4958dc4 100644
--- a/fs/gfs2/trans.h
+++ b/fs/gfs2/trans.h
@@ -27,7 +27,7 @@ struct gfs2_glock;
  * block, or all of the blocks in the rg, whichever is smaller */
 static inline unsigned int gfs2_rg_blocks(const struct gfs2_inode *ip, unsigned requested)
 {
-	struct gfs2_rgrpd *rgd = ip->i_res.rs_rbm.rgd;
+	struct gfs2_rgrpd *rgd = ip->i_res.rs_rgd;
 
 	if (requested < rgd->rd_length)
 		return requested + 1;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 03/17] gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 01/17] gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 02/17] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 04/17] gfs2: Minor gfs2_inplace_reserve cleanup Andreas Gruenbacher
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Variable ndata is only used inside "if (!dinode)", so it can be replaced
entirely with *nblocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index f044430f5c52..af3e15baea89 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2386,7 +2386,6 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct buffer_head *dibh;
 	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
-	unsigned int ndata;
 	u64 block; /* block, within the file system scope */
 	int error;
 
@@ -2412,12 +2411,8 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	rbm.rgd->rd_last_alloc = block - rbm.rgd->rd_data0;
 	if (gfs2_rs_active(&ip->i_res))
 		gfs2_adjust_reservation(ip, &rbm, *nblocks);
-	ndata = *nblocks;
-	if (dinode)
-		ndata--;
-
 	if (!dinode) {
-		ip->i_goal = block + ndata - 1;
+		ip->i_goal = block + *nblocks - 1;
 		error = gfs2_meta_inode_buffer(ip, &dibh);
 		if (error == 0) {
 			struct gfs2_dinode *di =
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 04/17] gfs2: Minor gfs2_inplace_reserve cleanup
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 03/17] gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 05/17] Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks reserved" Andreas Gruenbacher
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Clean up the reservation size computation logic in gfs2_inplace_reserve a
little.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index af3e15baea89..6c246bf7eada 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2045,12 +2045,13 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	int error = 0, rg_locked, flags = 0;
 	u64 last_unlinked = NO_BLOCK;
+	u32 target = ap->target;
 	int loops = 0;
 	u32 free_blocks, skip = 0;
 
 	if (sdp->sd_args.ar_rgrplvb)
 		flags |= GL_SKIP;
-	if (gfs2_assert_warn(sdp, ap->target))
+	if (gfs2_assert_warn(sdp, target))
 		return -EINVAL;
 	if (gfs2_rs_active(rs)) {
 		begin = rs->rs_rgd;
@@ -2102,7 +2103,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		/* Skip unusable resource groups */
 		if ((rs->rs_rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
-		    (loops == 0 && ap->target > rs->rs_rgd->rd_extfail_pt))
+		    (loops == 0 && target > rs->rs_rgd->rd_extfail_pt))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
@@ -2118,9 +2119,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 		/* If rgrp has enough free space, use it */
 		free_blocks = rgd_free(rs->rs_rgd, rs);
-		if (free_blocks >= ap->target ||
-		    (loops == 2 && ap->min_target &&
-		     free_blocks >= ap->min_target)) {
+		if (free_blocks >= target) {
 			ap->allowed = free_blocks;
 			return 0;
 		}
@@ -2156,9 +2155,12 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 				return error;
 		}
 		/* Flushing the log may release space */
-		if (loops == 2)
+		if (loops == 2) {
+			if (ap->min_target)
+				target = ap->min_target;
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 				       GFS2_LFC_INPLACE_RESERVE);
+		}
 	}
 
 	return -ENOSPC;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 05/17] Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks reserved"
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 04/17] gfs2: Minor gfs2_inplace_reserve cleanup Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 06/17] gfs2: Don't clear GBF_FULL flags in rs_deltree Andreas Gruenbacher
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This reverts commit e79e0e1428188b24c3b57309ffa54a33c4ae40c4.

It turns out that we're only setting the GBF_FULL flag of a bitmap if we've
been scanning from the beginning of the bitmap until the end and we haven't
found a single free block, and that blocks are only marked as allocated when
they are allocated, no when they are merely reserved.  This means that
independent of any reservations, we can skip skip bitmaps with the GBF_FULL
flag set in gfs2_rbm_find.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 6c246bf7eada..dbb0bcf48078 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1751,8 +1751,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 
 	while(1) {
 		bi = rbm_bi(rbm);
-		if ((ip == NULL || !gfs2_rs_active(&ip->i_res)) &&
-		    test_bit(GBF_FULL, &bi->bi_flags) &&
+		if (test_bit(GBF_FULL, &bi->bi_flags) &&
 		    (state == GFS2_BLKST_FREE))
 			goto next_bitmap;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 06/17] gfs2: Don't clear GBF_FULL flags in rs_deltree
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 05/17] Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks reserved" Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 07/17] gfs2: Set GBF_FULL flags when reading resource groups Andreas Gruenbacher
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Removing a reservation doesn't make any actual space available, so don't clear
the GBF_FULL flags in that case.  Otherwise, we'll only spend time scanning the
bitmaps again unnecessarily.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index dbb0bcf48078..fcca1cd58ab3 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -327,20 +327,6 @@ static bool gfs2_rbm_add(struct gfs2_rbm *rbm, u32 blocks)
 	}
 }
 
-static struct gfs2_bitmap *gfs2_block_to_bitmap(struct gfs2_rgrpd *rgd,
-						u64 block)
-{
-	unsigned int delta = (sizeof(struct gfs2_rgrp) -
-			      sizeof(struct gfs2_meta_header)) * GFS2_NBBY;
-	unsigned int rblock, bii;
-
-	if (!rgrp_contains_block(rgd, block))
-		return NULL;
-	rblock = block - rgd->rd_data0;
-	bii = (rblock + delta) / rgd->rd_sbd->sd_blocks_per_bitmap;
-	return rgd->rd_bits + bii;
-}
-
 /**
  * gfs2_unaligned_extlen - Look for free blocks which are not byte aligned
  * @rbm: Position to search (value/result)
@@ -660,25 +646,16 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		u64 last_block = rs->rs_start + rs->rs_free - 1;
-		struct gfs2_bitmap *start, *last;
-
 		/* return reserved blocks to the rgrp */
 		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
 		rs->rs_rgd->rd_reserved -= rs->rs_free;
+
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
 		rgd->rd_extfail_pt += rs->rs_free;
 		rs->rs_free = 0;
-		start = gfs2_block_to_bitmap(rgd, rs->rs_start);
-		last = gfs2_block_to_bitmap(rgd, last_block);
-		if (!start || !last)
-			return;
-		do
-			clear_bit(GBF_FULL, &start->bi_flags);
-		while (start++ != last);
 	}
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 07/17] gfs2: Set GBF_FULL flags when reading resource groups
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (5 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 06/17] gfs2: Don't clear GBF_FULL flags in rs_deltree Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 08/17] gfs2: When looking for blocks to allocate, don't clamp at free blocks Andreas Gruenbacher
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When reading a resource group from disk or when receiving the resource group
statistics from a Lock Value Block (LVB), set/clear the GBF_FULL flags of the
bitmaps in that resource group according to whether the resource group is full
or not.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index fcca1cd58ab3..d96e76e91769 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1165,6 +1165,23 @@ static u32 count_unlinked(struct gfs2_rgrpd *rgd)
 	return count;
 }
 
+static void rgrp_set_bitmap_flags(struct gfs2_rgrpd *rgd)
+{
+	struct gfs2_bitmap *bi;
+	int x;
+
+	if (rgd->rd_free) {
+		for (x = 0; x < rgd->rd_length; x++) {
+			bi = rgd->rd_bits + x;
+			clear_bit(GBF_FULL, &bi->bi_flags);
+		}
+	} else {
+		for (x = 0; x < rgd->rd_length; x++) {
+			bi = rgd->rd_bits + x;
+			set_bit(GBF_FULL, &bi->bi_flags);
+		}
+	}
+}
 
 /**
  * gfs2_rgrp_bh_get - Read in a RG's header and bitmaps
@@ -1208,9 +1225,8 @@ static int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
 	}
 
 	if (!(rgd->rd_flags & GFS2_RDF_UPTODATE)) {
-		for (x = 0; x < length; x++)
-			clear_bit(GBF_FULL, &rgd->rd_bits[x].bi_flags);
 		gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
+		rgrp_set_bitmap_flags(rgd);
 		rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
 		rgd->rd_free_clone = rgd->rd_free;
 		/* max out the rgrp allocation failure point */
@@ -1260,6 +1276,7 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	if (rgd->rd_rgl->rl_unlinked == 0)
 		rgd->rd_flags &= ~GFS2_RDF_CHECK;
 	rgd->rd_free = be32_to_cpu(rgd->rd_rgl->rl_free);
+	rgrp_set_bitmap_flags(rgd);
 	rgd->rd_free_clone = rgd->rd_free;
 	rgd->rd_dinodes = be32_to_cpu(rgd->rd_rgl->rl_dinodes);
 	rgd->rd_igeneration = be64_to_cpu(rgd->rd_rgl->rl_igeneration);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 08/17] gfs2: When looking for blocks to allocate, don't clamp at free blocks
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (6 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 07/17] gfs2: Set GBF_FULL flags when reading resource groups Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 09/17] gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt Andreas Gruenbacher
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When looking for blocks to allocate, don't limit the nuber of blocks we are
looking for by the number of available blocks: this defeats the following check
whether enough blocks are still available and it will lead to nonsensical
"0-block allocations" when a resource group is out of free blocks.

Fixes: ff7f4cb46116 ("GFS2: Consolidate free block searching functions")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index d96e76e91769..dba15d25ff96 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1571,7 +1571,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 		extlen = 1;
 	else {
 		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
-		extlen = clamp(extlen, (u32)RGRP_RSRV_MINBLKS, free_blocks);
+		extlen = max_t(u32, extlen, RGRP_RSRV_MINBLKS);
 	}
 	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
 		return;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 09/17] gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (7 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 08/17] gfs2: When looking for blocks to allocate, don't clamp at free blocks Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 10/17] gfs2: Only pass reservation down to gfs2_rbm_find Andreas Gruenbacher
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Pass a non-NULL minext to gfs2_rbm_find even for single-block allocations.  In
gfs2_rbm_find, set rgd->rd_extfail_pt to 1 also when a single-block allocation
fails in a resource group: there is no reason for treating that case
differently.  In gfs2_reservation_check_and_update, only check how many free
blocks we have if more than one block is requested; we already know there's at
least one free block.

In addition, when allocating N blocks fails in gfs2_rbm_find, we need to set
rd_extfail_pt to N - 1 rather than N:  rd_extfail_pt defines the biggest
allocation that might still succeed.

Finally, reset rd_extfail_pt when updating the resource group statistics in
update_rgrp_lvb, as already done in gfs2_rgrp_bh_get.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index dba15d25ff96..05c97f8a078a 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1278,6 +1278,8 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	rgd->rd_free = be32_to_cpu(rgd->rd_rgl->rl_free);
 	rgrp_set_bitmap_flags(rgd);
 	rgd->rd_free_clone = rgd->rd_free;
+	/* max out the rgrp allocation failure point */
+	rgd->rd_extfail_pt = rgd->rd_free;
 	rgd->rd_dinodes = be32_to_cpu(rgd->rd_rgl->rl_dinodes);
 	rgd->rd_igeneration = be64_to_cpu(rgd->rd_rgl->rl_igeneration);
 	return 0;
@@ -1673,7 +1675,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 	 * If we have a minimum extent length, then skip over any extent
 	 * which is less than the min extent length in size.
 	 */
-	if (minext) {
+	if (minext > 1) {
 		extlen = gfs2_free_extlen(rbm, minext);
 		if (extlen <= maxext->len)
 			goto fail;
@@ -1708,7 +1710,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
  * gfs2_rbm_find - Look for blocks of a particular state
  * @rbm: Value/result starting position and final position
  * @state: The state which we want to find
- * @minext: Pointer to the requested extent length (NULL for a single block)
+ * @minext: Pointer to the requested extent length
  *          This is updated to be the actual reservation size.
  * @ip: If set, check for reservations
  * @nowrap: Stop looking at the end of the rgrp, rather than wrapping
@@ -1764,8 +1766,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 		if (ip == NULL)
 			return 0;
 
-		ret = gfs2_reservation_check_and_update(rbm, ip,
-							minext ? *minext : 0,
+		ret = gfs2_reservation_check_and_update(rbm, ip, *minext,
 							&maxext);
 		if (ret == 0)
 			return 0;
@@ -1797,7 +1798,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 			break;
 	}
 
-	if (minext == NULL || state != GFS2_BLKST_FREE)
+	if (state != GFS2_BLKST_FREE)
 		return -ENOSPC;
 
 	/* If the extent was too small, and it's smaller than the smallest
@@ -1805,7 +1806,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 	   useless to search this rgrp again for this amount or more. */
 	if (wrapped && (scan_from_start || rbm->bii > last_bii) &&
 	    *minext < rbm->rgd->rd_extfail_pt)
-		rbm->rgd->rd_extfail_pt = *minext;
+		rbm->rgd->rd_extfail_pt = *minext - 1;
 
 	/* If the maximum extent we found is big enough to fulfill the
 	   minimum requirements, use it anyway. */
@@ -2382,14 +2383,15 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	struct buffer_head *dibh;
 	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
 	u64 block; /* block, within the file system scope */
+	u32 minext = 1;
 	int error;
 
 	gfs2_set_alloc_start(&rbm, ip, dinode);
-	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
+	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, ip, false);
 
 	if (error == -ENOSPC) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
-		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, NULL, false);
+		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, NULL, false);
 	}
 
 	/* Since all blocks are reserved in advance, this shouldn't happen */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 10/17] gfs2: Only pass reservation down to gfs2_rbm_find
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (8 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 09/17] gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 11/17] gfs: Don't search for unreserved space twice Andreas Gruenbacher
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Only pass the current reservation down to gfs2_rbm_find rather than the entire
inode; we don't need any of the other information.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 05c97f8a078a..227a24eb83bf 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -79,7 +79,7 @@ static const char valid_change[16] = {
 };
 
 static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
-			 const struct gfs2_inode *ip, bool nowrap);
+			 struct gfs2_blkreserv *rs, bool nowrap);
 
 
 /**
@@ -1587,7 +1587,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 	if (WARN_ON(gfs2_rbm_from_block(&rbm, goal)))
 		return;
 
-	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true);
+	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, &ip->i_res, true);
 	if (ret == 0) {
 		rs->rs_start = gfs2_rbm_to_block(&rbm);
 		rs->rs_free = extlen;
@@ -1603,7 +1603,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
  * @rgd: The resource group
  * @block: The starting block
  * @length: The required length
- * @ip: Ignore any reservations for this inode
+ * @ignore_rs: Reservation to ignore
  *
  * If the block does not appear in any reservation, then return the
  * block number unchanged. If it does appear in the reservation, then
@@ -1613,7 +1613,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 
 static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 				      u32 length,
-				      const struct gfs2_inode *ip)
+				      struct gfs2_blkreserv *ignore_rs)
 {
 	struct gfs2_blkreserv *rs;
 	struct rb_node *n;
@@ -1633,7 +1633,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 	}
 
 	if (n) {
-		while ((rs_cmp(block, length, rs) == 0) && (&ip->i_res != rs)) {
+		while (rs_cmp(block, length, rs) == 0 && rs != ignore_rs) {
 			block = rs->rs_start + rs->rs_free;
 			n = n->rb_right;
 			if (n == NULL)
@@ -1649,7 +1649,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 /**
  * gfs2_reservation_check_and_update - Check for reservations during block alloc
  * @rbm: The current position in the resource group
- * @ip: The inode for which we are searching for blocks
+ * @rs: Our own reservation
  * @minext: The minimum extent length
  * @maxext: A pointer to the maximum extent structure
  *
@@ -1663,7 +1663,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
  */
 
 static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
-					     const struct gfs2_inode *ip,
+					     struct gfs2_blkreserv *rs,
 					     u32 minext,
 					     struct gfs2_extent *maxext)
 {
@@ -1685,7 +1685,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 	 * Check the extent which has been found against the reservations
 	 * and skip if parts of it are already reserved
 	 */
-	nblock = gfs2_next_unreserved_block(rbm->rgd, block, extlen, ip);
+	nblock = gfs2_next_unreserved_block(rbm->rgd, block, extlen, rs);
 	if (nblock == block) {
 		if (!minext || extlen >= minext)
 			return 0;
@@ -1712,7 +1712,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
  * @state: The state which we want to find
  * @minext: Pointer to the requested extent length
  *          This is updated to be the actual reservation size.
- * @ip: If set, check for reservations
+ * @rs: Our own reservation (NULL to skip checking reservations)
  * @nowrap: Stop looking at the end of the rgrp, rather than wrapping
  *          around until we've reached the starting point.
  *
@@ -1726,7 +1726,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
  */
 
 static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
-			 const struct gfs2_inode *ip, bool nowrap)
+			 struct gfs2_blkreserv *rs, bool nowrap)
 {
 	bool scan_from_start = rbm->bii == 0 && rbm->offset == 0;
 	struct buffer_head *bh;
@@ -1763,10 +1763,10 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 			goto next_bitmap;
 		}
 		rbm->offset = offset;
-		if (ip == NULL)
+		if (!rs)
 			return 0;
 
-		ret = gfs2_reservation_check_and_update(rbm, ip, *minext,
+		ret = gfs2_reservation_check_and_update(rbm, rs, *minext,
 							&maxext);
 		if (ret == 0)
 			return 0;
@@ -2387,7 +2387,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	int error;
 
 	gfs2_set_alloc_start(&rbm, ip, dinode);
-	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, ip, false);
+	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, &ip->i_res, false);
 
 	if (error == -ENOSPC) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 11/17] gfs: Don't search for unreserved space twice
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (9 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 10/17] gfs2: Only pass reservation down to gfs2_rbm_find Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 12/17] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

If gfs2_inplace_reserve has chosen a resource group but it couldn't make a
reservation there, there are too many other reservations in that resource
group.  In that case, don't even try to respect existing reservations in
gfs2_alloc_blocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 227a24eb83bf..c4a19798d3aa 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2384,11 +2384,12 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
 	u64 block; /* block, within the file system scope */
 	u32 minext = 1;
-	int error;
-
-	gfs2_set_alloc_start(&rbm, ip, dinode);
-	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, &ip->i_res, false);
+	int error = -ENOSPC;
 
+	if (gfs2_rs_active(&ip->i_res)) {
+		gfs2_set_alloc_start(&rbm, ip, dinode);
+		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, &ip->i_res, false);
+	}
 	if (error == -ENOSPC) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
 		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, NULL, false);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 12/17] gfs2: Add per-reservation reserved block accounting
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (10 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 11/17] gfs: Don't search for unreserved space twice Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 13/17] gfs2: Add local resource group locking Andreas Gruenbacher
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the
number of blocks reserved by this particular reservation.  When making a
reservation with gfs2_inplace_reserve, this field is set to somewhere
between ap->min_target and ap->target depending on the number of free
blocks in the resource group.  When allocating blocks with
gfs2_alloc_blocks, rs_reserved is decremented accordingly.  Eventually,
any reserved but not consumed blocks are returned to the resource group
by gfs2_inplace_release.

The reservation tree (rd_rstree) is unaffected by this change: the
reservations it tracks are still advisory, and the sizes of those
reservations (rs_free) are still determined by the tentative allocation
sizes (i_sizehint).  Since rd_reserved now tracks the number of reserved
blocks rather than the number of tentatively reserved blocks, we may end
up with slightly different allocation patterns, though. The
rd_extfail_pt optimization will still cause ill-suited resource groups
to be skipped quickly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c       |   4 +-
 fs/gfs2/incore.h     |   1 +
 fs/gfs2/lops.c       |   1 +
 fs/gfs2/rgrp.c       | 110 ++++++++++++++++++++++---------------------
 fs/gfs2/trace_gfs2.h |   8 +++-
 5 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index b39b339feddc..7246aaf86862 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1115,8 +1115,8 @@ static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t
 			goto out_qunlock;
 
 		/* check if the selected rgrp limits our max_blks further */
-		if (ap.allowed && ap.allowed < max_blks)
-			max_blks = ap.allowed;
+		if (ip->i_res.rs_reserved < max_blks)
+			max_blks = ip->i_res.rs_reserved;
 
 		/* Almost done. Calculate bytes that can be written using
 		 * max_blks. We also recompute max_bytes, data_blocks and
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index fb487e90eac7..6f0790d3f71d 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -287,6 +287,7 @@ struct gfs2_blkreserv {
 	struct gfs2_rgrpd *rs_rgd;
 	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
+	u32 rs_reserved;              /* number of reserved blocks */
 };
 
 /*
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index ed1da4323967..f621a021e21b 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -83,6 +83,7 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	       bd->bd_bh->b_data + bi->bi_offset, bi->bi_bytes);
 	clear_bit(GBF_FULL, &bi->bi_flags);
 	rgd->rd_free_clone = rgd->rd_free;
+	BUG_ON(rgd->rd_free_clone < rgd->rd_reserved);
 	rgd->rd_extfail_pt = rgd->rd_free;
 }
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index c4a19798d3aa..2f6eca015fab 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -646,10 +646,6 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
-		rs->rs_rgd->rd_reserved -= rs->rs_free;
-
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
@@ -1229,6 +1225,7 @@ static int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
 		rgrp_set_bitmap_flags(rgd);
 		rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
 		rgd->rd_free_clone = rgd->rd_free;
+		BUG_ON(rgd->rd_reserved);
 		/* max out the rgrp allocation failure point */
 		rgd->rd_extfail_pt = rgd->rd_free;
 	}
@@ -1278,6 +1275,7 @@ static int update_rgrp_lvb(struct gfs2_rgrpd *rgd)
 	rgd->rd_free = be32_to_cpu(rgd->rd_rgl->rl_free);
 	rgrp_set_bitmap_flags(rgd);
 	rgd->rd_free_clone = rgd->rd_free;
+	BUG_ON(rgd->rd_reserved);
 	/* max out the rgrp allocation failure point */
 	rgd->rd_extfail_pt = rgd->rd_free;
 	rgd->rd_dinodes = be32_to_cpu(rgd->rd_rgl->rl_dinodes);
@@ -1515,41 +1513,10 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-
-	/* Do our rgrp accounting for the reservation */
-	rgd->rd_reserved += rs->rs_free; /* blocks reserved */
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
 
-/**
- * rgd_free - return the number of free blocks we can allocate.
- * @rgd: the resource group
- *
- * This function returns the number of free blocks for an rgrp.
- * That's the clone-free blocks (blocks that are free, not including those
- * still being used for unlinked files that haven't been deleted.)
- *
- * It also subtracts any blocks reserved by someone else, but does not
- * include free blocks that are still part of our current reservation,
- * because obviously we can (and will) allocate them.
- */
-static inline u32 rgd_free(struct gfs2_rgrpd *rgd, struct gfs2_blkreserv *rs)
-{
-	u32 tot_reserved, tot_free;
-
-	if (WARN_ON_ONCE(rgd->rd_reserved < rs->rs_free))
-		return 0;
-	tot_reserved = rgd->rd_reserved - rs->rs_free;
-
-	if (rgd->rd_free_clone < tot_reserved)
-		tot_reserved = 0;
-
-	tot_free = rgd->rd_free_clone - tot_reserved;
-
-	return tot_free;
-}
-
 /**
  * rg_mblk_search - find a group of multiple free blocks to form a reservation
  * @rgd: the resource group descriptor
@@ -1565,7 +1532,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 	u64 goal;
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	u32 extlen;
-	u32 free_blocks = rgd_free(rgd, rs);
+	u32 blocks_available;
 	int ret;
 	struct inode *inode = &ip->i_inode;
 
@@ -1575,8 +1542,16 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
 		extlen = max_t(u32, extlen, RGRP_RSRV_MINBLKS);
 	}
-	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
+
+	spin_lock(&rgd->rd_rsspin);
+	blocks_available = rgd->rd_free_clone - rgd->rd_reserved;
+	if (rgd == rs->rs_rgd)
+		blocks_available += rs->rs_reserved;
+	if (blocks_available < extlen) {
+		spin_unlock(&rgd->rd_rsspin);
 		return;
+	}
+	spin_unlock(&rgd->rd_rsspin);
 
 	/* Find bitmap block that contains bits for goal block */
 	if (rgrp_contains_block(rgd, ip->i_goal))
@@ -2024,8 +1999,7 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
  * We try our best to find an rgrp that has at least ap->target blocks
  * available. After a couple of passes (loops == 2), the prospects of finding
  * such an rgrp diminish. At this stage, we return the first rgrp that has
- * at least ap->min_target blocks available. Either way, we set ap->allowed to
- * the number of blocks available in the chosen rgrp.
+ * at least ap->min_target blocks available.
  *
  * Returns: 0 on success,
  *          -ENOMEM if a suitable rgrp can't be found
@@ -2041,7 +2015,9 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	u64 last_unlinked = NO_BLOCK;
 	u32 target = ap->target;
 	int loops = 0;
-	u32 free_blocks, skip = 0;
+	u32 blocks_available, skip = 0;
+
+	BUG_ON(rs->rs_reserved);
 
 	if (sdp->sd_args.ar_rgrplvb)
 		flags |= GL_SKIP;
@@ -2062,6 +2038,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		return -EBADSLT;
 
 	while (loops < 3) {
+		struct gfs2_rgrpd *rgd;
+
 		rg_locked = 1;
 
 		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
@@ -2112,11 +2090,19 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			goto check_rgrp;
 
 		/* If rgrp has enough free space, use it */
-		free_blocks = rgd_free(rs->rs_rgd, rs);
-		if (free_blocks >= target) {
-			ap->allowed = free_blocks;
-			return 0;
+		rgd = rs->rs_rgd;
+		spin_lock(&rgd->rd_rsspin);
+		blocks_available = rgd->rd_free_clone - rgd->rd_reserved;
+		if (blocks_available < target) {
+			spin_unlock(&rgd->rd_rsspin);
+			goto check_rgrp;
 		}
+		rs->rs_reserved = ap->target;
+		if (rs->rs_reserved > blocks_available)
+			rs->rs_reserved = blocks_available;
+		rgd->rd_reserved += rs->rs_reserved;
+		spin_unlock(&rgd->rd_rsspin);
+		return 0;
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
 		if (rs->rs_rgd->rd_flags & GFS2_RDF_CHECK)
@@ -2169,6 +2155,17 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 void gfs2_inplace_release(struct gfs2_inode *ip)
 {
+	struct gfs2_blkreserv *rs = &ip->i_res;
+
+	if (rs->rs_reserved) {
+		struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+		spin_lock(&rgd->rd_rsspin);
+		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
+		rgd->rd_reserved -= rs->rs_reserved;
+		spin_unlock(&rgd->rd_rsspin);
+		rs->rs_reserved = 0;
+	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
 		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 }
@@ -2310,7 +2307,8 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
 
-	spin_lock(&rgd->rd_rsspin);
+	BUG_ON(rs->rs_reserved < len);
+	rs->rs_reserved -= len;
 	if (gfs2_rs_active(rs)) {
 		u64 start = gfs2_rbm_to_block(rbm);
 
@@ -2320,19 +2318,16 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
-			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
-				goto out;
+				return;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
 			atomic_add(RGRP_RSRV_ADDBLKS, &ip->i_sizehint);
 		}
 		__rs_deltree(rs);
 	}
-out:
-	spin_unlock(&rgd->rd_rsspin);
 }
 
 /**
@@ -2386,6 +2381,8 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	u32 minext = 1;
 	int error = -ENOSPC;
 
+	BUG_ON(ip->i_res.rs_reserved < *nblocks);
+
 	if (gfs2_rs_active(&ip->i_res)) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
 		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, &ip->i_res, false);
@@ -2407,8 +2404,6 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	gfs2_alloc_extent(&rbm, dinode, nblocks);
 	block = gfs2_rbm_to_block(&rbm);
 	rbm.rgd->rd_last_alloc = block - rbm.rgd->rd_data0;
-	if (gfs2_rs_active(&ip->i_res))
-		gfs2_adjust_reservation(ip, &rbm, *nblocks);
 	if (!dinode) {
 		ip->i_goal = block + *nblocks - 1;
 		error = gfs2_meta_inode_buffer(ip, &dibh);
@@ -2421,12 +2416,20 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 			brelse(dibh);
 		}
 	}
-	if (rbm.rgd->rd_free < *nblocks) {
+	spin_lock(&rbm.rgd->rd_rsspin);
+	gfs2_adjust_reservation(ip, &rbm, *nblocks);
+	if (rbm.rgd->rd_reserved < *nblocks) {
 		fs_warn(sdp, "nblocks=%u\n", *nblocks);
+		spin_unlock(&rbm.rgd->rd_rsspin);
 		goto rgrp_error;
 	}
-
+	BUG_ON(rbm.rgd->rd_reserved < *nblocks);
+	BUG_ON(rbm.rgd->rd_free_clone < *nblocks);
+	BUG_ON(rbm.rgd->rd_free < *nblocks);
+	rbm.rgd->rd_reserved -= *nblocks;
+	rbm.rgd->rd_free_clone -= *nblocks;
 	rbm.rgd->rd_free -= *nblocks;
+	spin_unlock(&rbm.rgd->rd_rsspin);
 	if (dinode) {
 		rbm.rgd->rd_dinodes++;
 		*generation = rbm.rgd->rd_igeneration++;
@@ -2443,7 +2446,6 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_quota_change(ip, *nblocks, ip->i_inode.i_uid, ip->i_inode.i_gid);
 
-	rbm.rgd->rd_free_clone -= *nblocks;
 	trace_gfs2_block_alloc(ip, rbm.rgd, block, *nblocks,
 			       dinode ? GFS2_BLKST_DINODE : GFS2_BLKST_USED);
 	*bn = block;
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 7586c7629497..282fcb1a242f 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -598,6 +598,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
 		__field(	u32,	free			)
+		__field(	u32,	reserved		)
 		__field(	u8,	func			)
 	),
 
@@ -610,17 +611,20 @@ TRACE_EVENT(gfs2_rs,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
+		__entry->reserved	= rs->rs_reserved;
 		__entry->func		= func;
 	),
 
-	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu",
+	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu r:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long long)__entry->rd_addr,
 		  (unsigned long)__entry->rd_free_clone,
 		  (unsigned long)__entry->rd_reserved,
-		  rs_func_name(__entry->func), (unsigned long)__entry->free)
+		  rs_func_name(__entry->func),
+		  (unsigned long)__entry->free,
+		  (unsigned long)__entry->reserved)
 );
 
 #endif /* _TRACE_GFS2_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 13/17] gfs2: Add local resource group locking
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (11 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 12/17] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 14/17] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

Prepare for treating resource group glocks as exclusive among nodes but
shared among all tasks running on a node: introduce another layer of
node-specific locking that the local tasks can use to coordinate their
accesses.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h |  2 ++
 fs/gfs2/lops.c   |  6 +++++-
 fs/gfs2/rgrp.c   | 53 ++++++++++++++++++++++++++++++++++++++++++------
 fs/gfs2/rgrp.h   |  4 ++++
 4 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 6f0790d3f71d..295c22441ade 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -20,6 +20,7 @@
 #include <linux/percpu.h>
 #include <linux/lockref.h>
 #include <linux/rhashtable.h>
+#include <linux/mutex.h>
 
 #define DIO_WAIT	0x00000010
 #define DIO_METADATA	0x00000020
@@ -118,6 +119,7 @@ struct gfs2_rgrpd {
 #define GFS2_RDF_PREFERRED	0x80000000 /* This rgrp is preferred */
 #define GFS2_RDF_MASK		0xf0000000 /* mask for internal flags */
 	spinlock_t rd_rsspin;           /* protects reservation related vars */
+	struct mutex rd_mutex;
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index f621a021e21b..d84e9dca22ae 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -75,8 +75,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
 	struct gfs2_bitmap *bi = rgd->rd_bits + index;
 
+	rgrp_lock_local(rgd);
 	if (bi->bi_clone == NULL)
-		return;
+		goto out;
 	if (sdp->sd_args.ar_discard)
 		gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
 	memcpy(bi->bi_clone + bi->bi_offset,
@@ -85,6 +86,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	rgd->rd_free_clone = rgd->rd_free;
 	BUG_ON(rgd->rd_free_clone < rgd->rd_reserved);
 	rgd->rd_extfail_pt = rgd->rd_free;
+
+out:
+	rgrp_unlock_local(rgd);
 }
 
 /**
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 2f6eca015fab..7eb3465dfbc9 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -917,6 +917,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
 	rgd->rd_data = be32_to_cpu(buf.ri_data);
 	rgd->rd_bitbytes = be32_to_cpu(buf.ri_bitbytes);
 	spin_lock_init(&rgd->rd_rsspin);
+	mutex_init(&rgd->rd_mutex);
 
 	error = compute_bitstructs(rgd);
 	if (error)
@@ -1442,9 +1443,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			/* Trim each bitmap in the rgrp */
 			for (x = 0; x < rgd->rd_length; x++) {
 				struct gfs2_bitmap *bi = rgd->rd_bits + x;
+				rgrp_lock_local(rgd);
 				ret = gfs2_rgrp_send_discards(sdp,
 						rgd->rd_data0, NULL, bi, minlen,
 						&amt);
+				rgrp_unlock_local(rgd);
 				if (ret) {
 					gfs2_glock_dq_uninit(&gh);
 					goto out;
@@ -1456,9 +1459,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			ret = gfs2_trans_begin(sdp, RES_RG_HDR, 0);
 			if (ret == 0) {
 				bh = rgd->rd_bits[0].bi_bh;
+				rgrp_lock_local(rgd);
 				rgd->rd_flags |= GFS2_RGF_TRIMMED;
 				gfs2_trans_add_meta(rgd->rd_gl, bh);
 				gfs2_rgrp_out(rgd, bh->b_data);
+				rgrp_unlock_local(rgd);
 				gfs2_trans_end(sdp);
 			}
 		}
@@ -2011,7 +2016,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrpd *begin = NULL;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	int error = 0, rg_locked, flags = 0;
+	int error = 0, flags = 0;
+	bool rg_locked;
 	u64 last_unlinked = NO_BLOCK;
 	u32 target = ap->target;
 	int loops = 0;
@@ -2040,10 +2046,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	while (loops < 3) {
 		struct gfs2_rgrpd *rgd;
 
-		rg_locked = 1;
-
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
-			rg_locked = 0;
+		rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
+		if (rg_locked) {
+			rgrp_lock_local(rs->rs_rgd);
+		} else {
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
@@ -2060,12 +2066,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
+			rgrp_lock_local(rs->rs_rgd);
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
 			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
+					rgrp_unlock_local(rs->rs_rgd);
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
 				}
@@ -2102,6 +2110,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rs->rs_reserved = blocks_available;
 		rgd->rd_reserved += rs->rs_reserved;
 		spin_unlock(&rgd->rd_rsspin);
+		rgrp_unlock_local(rs->rs_rgd);
 		return 0;
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
@@ -2109,6 +2118,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
+		rgrp_unlock_local(rs->rs_rgd);
+
 		/* Drop reservation, if we couldn't use reserved rgrp */
 		if (gfs2_rs_active(rs))
 			gfs2_rs_deltree(rs);
@@ -2256,6 +2267,7 @@ void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl,
 
 	if (rgd == NULL)
 		return;
+	spin_lock(&rgd->rd_rsspin);
 	gfs2_print_dbg(seq, "%s R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
 		       fs_id_buf,
 		       (unsigned long long)rgd->rd_addr, rgd->rd_flags,
@@ -2269,7 +2281,6 @@ void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl,
 			       be32_to_cpu(rgl->rl_free),
 			       be32_to_cpu(rgl->rl_dinodes));
 	}
-	spin_lock(&rgd->rd_rsspin);
 	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
 		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
 		dump_rs(seq, trs, fs_id_buf);
@@ -2383,6 +2394,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	BUG_ON(ip->i_res.rs_reserved < *nblocks);
 
+	rgrp_lock_local(rbm.rgd);
 	if (gfs2_rs_active(&ip->i_res)) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
 		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &minext, &ip->i_res, false);
@@ -2439,6 +2451,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_trans_add_meta(rbm.rgd->rd_gl, rbm.rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rbm.rgd, rbm.rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rbm.rgd);
 
 	gfs2_statfs_change(sdp, 0, -(s64)*nblocks, dinode ? 1 : 0);
 	if (dinode)
@@ -2452,6 +2465,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	return 0;
 
 rgrp_error:
+	rgrp_unlock_local(rbm.rgd);
 	gfs2_rgrp_error(rbm.rgd);
 	return -EIO;
 }
@@ -2471,12 +2485,14 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
 	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
 	rgd->rd_free += blen;
 	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 
 	/* Directories keep their data in the metadata address space */
 	if (meta || ip->i_depth)
@@ -2512,17 +2528,20 @@ void gfs2_unlink_di(struct inode *inode)
 	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
 	if (!rgd)
 		return;
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, 1);
+	rgrp_unlock_local(rgd);
 }
 
 void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
 	if (!rgd->rd_dinodes)
 		gfs2_consist_rgrpd(rgd);
@@ -2531,6 +2550,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, -1);
 
 	gfs2_statfs_change(sdp, 0, +1, -1);
@@ -2545,6 +2565,10 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
  * @no_addr: The block number to check
  * @type: The block type we are looking for
  *
+ * The inode glock of @no_addr must be held.  The @type to check for is either
+ * GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; checking for type GFS2_BLKST_FREE
+ * or GFS2_BLKST_USED would make no sense.
+ *
  * Returns: 0 if the block type matches the expected type
  *          -ESTALE if it doesn't match
  *          or -ve errno if something went wrong while checking
@@ -2570,6 +2594,12 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
 	if (WARN_ON_ONCE(error))
 		goto fail;
 
+	/*
+	 * No need to take the local resource group lock here; the inode glock
+	 * of @no_addr provides the necessary synchronization in case the block
+	 * is an inode.  (In case the block is not an inode, the block type
+	 * will not match the @type we are looking for.)
+	 */
 	if (gfs2_testbit(&rbm, false) != type)
 		error = -ESTALE;
 
@@ -2692,3 +2722,14 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
 	}
 }
 
+void rgrp_lock_local(struct gfs2_rgrpd *rgd)
+{
+	BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
+	       !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
+	mutex_lock(&rgd->rd_mutex);
+}
+
+void rgrp_unlock_local(struct gfs2_rgrpd *rgd)
+{
+	mutex_unlock(&rgd->rd_mutex);
+}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index a1d7e14fc55b..4c2e79639938 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -88,4 +88,8 @@ static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block)
 }
 
 extern void check_and_update_goal(struct gfs2_inode *ip);
+
+extern void rgrp_lock_local(struct gfs2_rgrpd *rgd);
+extern void rgrp_unlock_local(struct gfs2_rgrpd *rgd);
+
 #endif /* __RGRP_DOT_H__ */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 14/17] gfs2: Allow node-wide exclusive glock sharing
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (12 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 13/17] gfs2: Add local resource group locking Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 15/17] gfs2: Use resource group " Andreas Gruenbacher
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.

A future patch will start using this flag to improve performance with
rgrp sharing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 22 +++++++++++++++++++---
 fs/gfs2/glock.h |  6 ++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 19d4db4c44e7..154b87e62c20 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -313,9 +313,23 @@ void gfs2_glock_put(struct gfs2_glock *gl)
 static inline int may_grant(const struct gfs2_glock *gl, const struct gfs2_holder *gh)
 {
 	const struct gfs2_holder *gh_head = list_first_entry(&gl->gl_holders, const struct gfs2_holder, gh_list);
-	if ((gh->gh_state == LM_ST_EXCLUSIVE ||
-	     gh_head->gh_state == LM_ST_EXCLUSIVE) && gh != gh_head)
-		return 0;
+
+	if (gh != gh_head) {
+		/**
+		 * Here we make a special exception to grant holders who agree
+		 * to share the EX lock with other holders who also have the
+		 * bit set. If the original holder has the LM_FLAG_NODE_SCOPE bit
+		 * is set, we grant more holders with the bit set.
+		 */
+		if (gh_head->gh_state == LM_ST_EXCLUSIVE &&
+		    (gh_head->gh_flags & LM_FLAG_NODE_SCOPE) &&
+		    gh->gh_state == LM_ST_EXCLUSIVE &&
+		    (gh->gh_flags & LM_FLAG_NODE_SCOPE))
+			return 1;
+		if ((gh->gh_state == LM_ST_EXCLUSIVE ||
+		     gh_head->gh_state == LM_ST_EXCLUSIVE))
+			return 0;
+	}
 	if (gl->gl_state == gh->gh_state)
 		return 1;
 	if (gh->gh_flags & GL_EXACT)
@@ -2037,6 +2051,8 @@ static const char *hflags2str(char *buf, u16 flags, unsigned long iflags)
 		*p++ = 'A';
 	if (flags & LM_FLAG_PRIORITY)
 		*p++ = 'p';
+	if (flags & LM_FLAG_NODE_SCOPE)
+		*p++ = 'n';
 	if (flags & GL_ASYNC)
 		*p++ = 'a';
 	if (flags & GL_EXACT)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 53813364517b..31a8f2f649b5 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -75,6 +75,11 @@ enum {
  * request and directly join the other shared lock.  A shared lock request
  * without the priority flag might be forced to wait until the deferred
  * requested had acquired and released the lock.
+ *
+ * LM_FLAG_NODE_SCOPE
+ * This holder agrees to share the lock within this node. In other words,
+ * the glock is held in EX mode according to DLM, but local holders on the
+ * same node can share it.
  */
 
 #define LM_FLAG_TRY		0x0001
@@ -82,6 +87,7 @@ enum {
 #define LM_FLAG_NOEXP		0x0004
 #define LM_FLAG_ANY		0x0008
 #define LM_FLAG_PRIORITY	0x0010
+#define LM_FLAG_NODE_SCOPE	0x0020
 #define GL_ASYNC		0x0040
 #define GL_EXACT		0x0080
 #define GL_SKIP			0x0100
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 15/17] gfs2: Use resource group glock sharing
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (13 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 14/17] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 16/17] gfs2: Limit the maximum amount of reserved space Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 17/17] gfs2: Rename rs_free to rs_wanted Andreas Gruenbacher
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

This patch takes advantage of the new glock holder sharing feature for
resource groups.  We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/inode.c |  6 +++---
 fs/gfs2/rgrp.c  | 10 +++++-----
 fs/gfs2/super.c |  3 ++-
 fs/gfs2/xattr.c |  6 ++++--
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index ff64fa85d3ac..ad034370a4f7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1541,7 +1541,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 				goto out;
 			}
 			ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
-						 0, rd_gh);
+						 LM_FLAG_NODE_SCOPE, rd_gh);
 			if (ret)
 				goto out;
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 6774865f5b5b..8b767ac6cec9 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1143,7 +1143,7 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
 	if (!rgd)
 		goto out_inodes;
 
-	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);
+	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, LM_FLAG_NODE_SCOPE, ghs + 2);
 
 
 	error = gfs2_glock_nq(ghs); /* parent */
@@ -1448,8 +1448,8 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 			error = -ENOENT;
 			goto out_gunlock;
 		}
-		error = gfs2_glock_nq_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, 0,
-					   &rd_gh);
+		error = gfs2_glock_nq_init(nrgd->rd_gl, LM_ST_EXCLUSIVE,
+					   LM_FLAG_NODE_SCOPE, &rd_gh);
 		if (error)
 			goto out_gunlock;
 	}
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7eb3465dfbc9..751bd31cfa5d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1435,7 +1435,8 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 
 	while (1) {
 
-		ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+		ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+					 LM_FLAG_NODE_SCOPE, &gh);
 		if (ret)
 			goto out;
 
@@ -2016,7 +2017,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrpd *begin = NULL;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	int error = 0, flags = 0;
+	int error = 0, flags = LM_FLAG_NODE_SCOPE;
 	bool rg_locked;
 	u64 last_unlinked = NO_BLOCK;
 	u32 target = ap->target;
@@ -2697,9 +2698,8 @@ void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist)
 				      sizeof(struct gfs2_holder),
 				      GFP_NOFS | __GFP_NOFAIL);
 	for (x = 0; x < rlist->rl_rgrps; x++)
-		gfs2_holder_init(rlist->rl_rgd[x]->rd_gl,
-				LM_ST_EXCLUSIVE, 0,
-				&rlist->rl_ghs[x]);
+		gfs2_holder_init(rlist->rl_rgd[x]->rd_gl, LM_ST_EXCLUSIVE,
+				 LM_FLAG_NODE_SCOPE, &rlist->rl_ghs[x]);
 }
 
 /**
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 3d9daac44e1c..0bff8f8f88a6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1229,7 +1229,8 @@ static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
 		goto out_qs;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &gh);
 	if (error)
 		goto out_qs;
 
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 9d7667bc4292..d332bf31f0ae 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -259,7 +259,8 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
 		return -EIO;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &rg_gh);
 	if (error)
 		return error;
 
@@ -1385,7 +1386,8 @@ static int ea_dealloc_block(struct gfs2_inode *ip)
 		return -EIO;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &gh);
 	if (error)
 		return error;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 16/17] gfs2: Limit the maximum amount of reserved space
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (14 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 15/17] gfs2: Use resource group " Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  2020-10-14  9:58 ` [Cluster-devel] [RFC 17/17] gfs2: Rename rs_free to rs_wanted Andreas Gruenbacher
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When allocating blocks for an inode, gfs2 tries to keep future allocations
efficient by "reserving" the blocks adjacent to the allocated blocks for future
use by this inode.  These "reservations" are node-local, and they can be stolen
by other nodes or even other processes on the same node when necessary.

Stealing from those reservations is very inefficient because it involves
scanning the bitmaps, repeatedly.  This slows down workloads that create many
small files, for example.  Fix this by discarding some of those
semi-reservations as soon as they cover more than half of the remaining space.

Fixes xfstests generic/488, generic/531.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/rgrp.c       | 30 ++++++++++++++++++++++++++++++
 fs/gfs2/trace_gfs2.h | 13 ++++++++++---
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 295c22441ade..4f68456da677 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -104,6 +104,7 @@ struct gfs2_rgrpd {
 	u32 rd_bitbytes;		/* number of bytes in data bitmaps */
 	u32 rd_free;
 	u32 rd_reserved;                /* number of blocks reserved */
+	u32 rd_wanted;
 	u32 rd_free_clone;
 	u32 rd_dinodes;
 	u64 rd_igeneration;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 751bd31cfa5d..1ed09e45738f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -646,6 +646,7 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
+		rgd->rd_wanted -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
@@ -1519,6 +1520,7 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
+	rgd->rd_wanted += rs->rs_free;
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
@@ -1997,6 +1999,26 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
 	return 0;
 }
 
+static bool gfs2_trim_wanted_blocks(struct gfs2_rgrpd *rgd)
+{
+	u32 max_wanted = (rgd->rd_free_clone - rgd->rd_reserved) / 2;
+	struct gfs2_blkreserv *rs, *next;
+	bool trimmed = false;
+
+	if (rgd->rd_wanted <= max_wanted)
+		goto out;
+	rbtree_postorder_for_each_entry_safe(rs, next, &rgd->rd_rstree, rs_node) {
+		if (rs->rs_reserved)
+			continue;
+		__rs_deltree(rs);
+		if (rgd->rd_wanted <= max_wanted)
+			break;
+	}
+
+out:
+	return trimmed;
+}
+
 /**
  * gfs2_inplace_reserve - Reserve space in the filesystem
  * @ip: the inode to reserve space for
@@ -2046,6 +2068,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 	while (loops < 3) {
 		struct gfs2_rgrpd *rgd;
+		bool retry = false;
 
 		rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
 		if (rg_locked) {
@@ -2110,7 +2133,13 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		if (rs->rs_reserved > blocks_available)
 			rs->rs_reserved = blocks_available;
 		rgd->rd_reserved += rs->rs_reserved;
+		if (!gfs2_rs_active(rs)) {
+			if (gfs2_trim_wanted_blocks(rgd))
+				retry = true;
+		}
 		spin_unlock(&rgd->rd_rsspin);
+		if (retry)
+			rg_mblk_search(rs->rs_rgd, ip, ap);
 		rgrp_unlock_local(rs->rs_rgd);
 		return 0;
 check_rgrp:
@@ -2330,6 +2359,7 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
+			rgd->rd_wanted -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 282fcb1a242f..4fd1614274fd 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -560,6 +560,7 @@ TRACE_EVENT(gfs2_block_alloc,
 		__field(        u64,	rd_addr			)
 		__field(        u32,	rd_free_clone		)
 		__field(	u32,	rd_reserved		)
+		__field(	u32,	rd_wanted		)
 	),
 
 	TP_fast_assign(
@@ -571,16 +572,19 @@ TRACE_EVENT(gfs2_block_alloc,
 		__entry->rd_addr	= rgd->rd_addr;
 		__entry->rd_free_clone	= rgd->rd_free_clone;
 		__entry->rd_reserved	= rgd->rd_reserved;
+		__entry->rd_wanted	= rgd->rd_wanted;
 	),
 
-	TP_printk("%u,%u bmap %llu alloc %llu/%lu %s rg:%llu rf:%u rr:%lu",
+	TP_printk("%u,%u bmap %llu alloc %llu/%lu %s rg:%llu rf:%u rr:%lu rw:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long)__entry->len,
 		  block_state_name(__entry->block_state),
 		  (unsigned long long)__entry->rd_addr,
-		  __entry->rd_free_clone, (unsigned long)__entry->rd_reserved)
+		  __entry->rd_free_clone,
+		  (unsigned long)__entry->rd_reserved,
+		  (unsigned long)__entry->rd_wanted)
 );
 
 /* Keep track of multi-block reservations as they are allocated/freed */
@@ -595,6 +599,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u64,	rd_addr			)
 		__field(	u32,	rd_free_clone		)
 		__field(	u32,	rd_reserved		)
+		__field(	u32,	rd_wanted		)
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
 		__field(	u32,	free			)
@@ -607,6 +612,7 @@ TRACE_EVENT(gfs2_rs,
 		__entry->rd_addr	= rs->rs_rgd->rd_addr;
 		__entry->rd_free_clone	= rs->rs_rgd->rd_free_clone;
 		__entry->rd_reserved	= rs->rs_rgd->rd_reserved;
+		__entry->rd_wanted	= rs->rs_rgd->rd_wanted;
 		__entry->inum		= container_of(rs, struct gfs2_inode,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
@@ -615,13 +621,14 @@ TRACE_EVENT(gfs2_rs,
 		__entry->func		= func;
 	),
 
-	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu r:%lu",
+	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu rw:%lu %s f:%lu r:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long long)__entry->rd_addr,
 		  (unsigned long)__entry->rd_free_clone,
 		  (unsigned long)__entry->rd_reserved,
+		  (unsigned long)__entry->rd_wanted,
 		  rs_func_name(__entry->func),
 		  (unsigned long)__entry->free,
 		  (unsigned long)__entry->reserved)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Cluster-devel] [RFC 17/17] gfs2: Rename rs_free to rs_wanted
  2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
                   ` (15 preceding siblings ...)
  2020-10-14  9:58 ` [Cluster-devel] [RFC 16/17] gfs2: Limit the maximum amount of reserved space Andreas Gruenbacher
@ 2020-10-14  9:58 ` Andreas Gruenbacher
  16 siblings, 0 replies; 18+ messages in thread
From: Andreas Gruenbacher @ 2020-10-14  9:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This should make it a bit more clear that the rs_start and rs_wanted fields in
struct gfs2_blkreserv only define a semi-reservation that other nodes and other
processes may still steal blocks from.  In contrast, rs_reserved indicates the
number of blocks actually reserved.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h     |  6 +++---
 fs/gfs2/rgrp.c       | 28 ++++++++++++++--------------
 fs/gfs2/trace_gfs2.h |  6 +++---
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4f68456da677..03e2126765b7 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -288,9 +288,9 @@ struct gfs2_qadata { /* quota allocation data */
 struct gfs2_blkreserv {
 	struct rb_node rs_node;       /* node within rd_rstree */
 	struct gfs2_rgrpd *rs_rgd;
-	u64 rs_start;		      /* start of reservation */
-	u32 rs_free;                  /* how many blocks are still free */
-	u32 rs_reserved;              /* number of reserved blocks */
+	u64 rs_start;		      /* start of semi-reservation */
+	u32 rs_wanted;                /* size of semi-reservation */
+	u32 rs_reserved;              /* blocks actually reserved */
 };
 
 /*
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1ed09e45738f..176b5e769b79 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -203,7 +203,7 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
  */
 static inline int rs_cmp(u64 start, u32 len, struct gfs2_blkreserv *rs)
 {
-	if (start >= rs->rs_start + rs->rs_free)
+	if (start >= rs->rs_start + rs->rs_wanted)
 		return 1;
 	if (rs->rs_start >= start + len)
 		return -1;
@@ -625,7 +625,7 @@ static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
 		       fs_id_buf,
 		       (unsigned long long)ip->i_no_addr,
 		       (unsigned long long)rs->rs_start,
-		       rs->rs_free);
+		       rs->rs_wanted);
 }
 
 /**
@@ -645,14 +645,14 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	rb_erase(&rs->rs_node, &rgd->rd_rstree);
 	RB_CLEAR_NODE(&rs->rs_node);
 
-	if (rs->rs_free) {
-		rgd->rd_wanted -= rs->rs_free;
+	if (rs->rs_wanted) {
+		rgd->rd_wanted -= rs->rs_wanted;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
-		rgd->rd_extfail_pt += rs->rs_free;
-		rs->rs_free = 0;
+		rgd->rd_extfail_pt += rs->rs_wanted;
+		rs->rs_wanted = 0;
 	}
 }
 
@@ -669,7 +669,7 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
 	if (rgd) {
 		spin_lock(&rgd->rd_rsspin);
 		__rs_deltree(rs);
-		BUG_ON(rs->rs_free);
+		BUG_ON(rs->rs_wanted);
 		spin_unlock(&rgd->rd_rsspin);
 	}
 }
@@ -1506,7 +1506,7 @@ static void rs_insert(struct gfs2_inode *ip)
 			rb_entry(*newn, struct gfs2_blkreserv, rs_node);
 
 		parent = *newn;
-		rc = rs_cmp(rs->rs_start, rs->rs_free, cur);
+		rc = rs_cmp(rs->rs_start, rs->rs_wanted, cur);
 		if (rc > 0)
 			newn = &((*newn)->rb_right);
 		else if (rc < 0)
@@ -1520,7 +1520,7 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-	rgd->rd_wanted += rs->rs_free;
+	rgd->rd_wanted += rs->rs_wanted;
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
@@ -1573,7 +1573,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, &ip->i_res, true);
 	if (ret == 0) {
 		rs->rs_start = gfs2_rbm_to_block(&rbm);
-		rs->rs_free = extlen;
+		rs->rs_wanted = extlen;
 		rs_insert(ip);
 	} else {
 		if (goal == rgd->rd_last_alloc + rgd->rd_data0)
@@ -1617,7 +1617,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 
 	if (n) {
 		while (rs_cmp(block, length, rs) == 0 && rs != ignore_rs) {
-			block = rs->rs_start + rs->rs_free;
+			block = rs->rs_start + rs->rs_wanted;
 			n = n->rb_right;
 			if (n == NULL)
 				break;
@@ -2357,12 +2357,12 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			unsigned int rlen;
 
 			rs->rs_start += len;
-			rlen = min(rs->rs_free, len);
-			rs->rs_free -= rlen;
+			rlen = min(rs->rs_wanted, len);
+			rs->rs_wanted -= rlen;
 			rgd->rd_wanted -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
-			    rs->rs_free)
+			    rs->rs_wanted)
 				return;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 4fd1614274fd..d751535d2604 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -602,7 +602,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u32,	rd_wanted		)
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
-		__field(	u32,	free			)
+		__field(	u32,	wanted			)
 		__field(	u32,	reserved		)
 		__field(	u8,	func			)
 	),
@@ -616,7 +616,7 @@ TRACE_EVENT(gfs2_rs,
 		__entry->inum		= container_of(rs, struct gfs2_inode,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
-		__entry->free		= rs->rs_free;
+		__entry->wanted		= rs->rs_wanted;
 		__entry->reserved	= rs->rs_reserved;
 		__entry->func		= func;
 	),
@@ -630,7 +630,7 @@ TRACE_EVENT(gfs2_rs,
 		  (unsigned long)__entry->rd_reserved,
 		  (unsigned long)__entry->rd_wanted,
 		  rs_func_name(__entry->func),
-		  (unsigned long)__entry->free,
+		  (unsigned long)__entry->wanted,
 		  (unsigned long)__entry->reserved)
 );
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-10-14  9:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14  9:58 [Cluster-devel] [RFC 00/17] gfs2: resource group glock sharing Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 01/17] gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 02/17] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 03/17] gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 04/17] gfs2: Minor gfs2_inplace_reserve cleanup Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 05/17] Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks reserved" Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 06/17] gfs2: Don't clear GBF_FULL flags in rs_deltree Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 07/17] gfs2: Set GBF_FULL flags when reading resource groups Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 08/17] gfs2: When looking for blocks to allocate, don't clamp at free blocks Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 09/17] gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 10/17] gfs2: Only pass reservation down to gfs2_rbm_find Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 11/17] gfs: Don't search for unreserved space twice Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 12/17] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 13/17] gfs2: Add local resource group locking Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 14/17] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 15/17] gfs2: Use resource group " Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 16/17] gfs2: Limit the maximum amount of reserved space Andreas Gruenbacher
2020-10-14  9:58 ` [Cluster-devel] [RFC 17/17] gfs2: Rename rs_free to rs_wanted Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.