All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing
@ 2018-12-01 11:10 Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 1/6] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is an update of the resource group glock sharing patch queue by Bob
and myself.  Some of the cleanups and fixes posted previously [*] have
been integrated upstream in the meantime.

Patch "gfs2: Add local resource group locking" is now built on top of
the existing rgrp locking scheme; I'll get back to cleaning that up
later.

Steve was questioning the performance impact of patch "gfs2: Only use
struct gfs2_rbm for bitmap manipulations".  I still think it's a
non-issue, but some analysis will be useful.

This updated patch queue has passed initial testing.

Thanks,
Andreas

[*] https://www.redhat.com/archives/cluster-devel/2018-October/msg00006.html

Andreas Gruenbacher (3):
  gfs2: Only use struct gfs2_rbm for bitmap manipulations
  gfs2: Clean up gfs2_adjust_reservation
  gfs2: Add per-reservation reserved block accounting

Bob Peterson (3):
  gfs2: Add local resource group locking
  gfs2: Allow node-wide exclusive glock sharing
  gfs2: Introduce resource group sharing

 fs/gfs2/bmap.c       |   4 +-
 fs/gfs2/file.c       |   4 +-
 fs/gfs2/glock.c      |  26 +++-
 fs/gfs2/glock.h      |   6 +
 fs/gfs2/incore.h     |  33 +----
 fs/gfs2/inode.c      |   5 +-
 fs/gfs2/lops.c       |   5 +-
 fs/gfs2/rgrp.c       | 322 ++++++++++++++++++++++++++++---------------
 fs/gfs2/rgrp.h       |   4 +
 fs/gfs2/super.c      |   3 +-
 fs/gfs2/trace_gfs2.h |  18 ++-
 fs/gfs2/trans.h      |   2 +-
 fs/gfs2/xattr.c      |   6 +-
 13 files changed, 279 insertions(+), 159 deletions(-)

-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 1/6] gfs2: Only use struct gfs2_rbm for bitmap manipulations
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation Andreas Gruenbacher
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
bit position within a resource group.  This representation is used in
the bitmap manipulation code to prevent excessive conversions between
block numbers and bit positions, but also in struct gfs2_blkreserv which
is part of struct gfs2_inode, to mark the start of a reservation.  In
the inode, the bit position representation makes less sense: first, the
start position is used as a block number about as often as a bit
position; second, the bit position representation makes the code
unnecessarily complicated and difficult to read.

Therefore, change struct gfs2_blkreserv to represent the start of a
reservation as a block number instead of a bit position.  (This requires
keeping track of the resource group in gfs2_blkreserv separately.)  With
that change, various things can be slightly simplified, and struct
gfs2_rbm can be moved to rgrp.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/bmap.c       |   2 +-
 fs/gfs2/incore.h     |  30 +--------
 fs/gfs2/rgrp.c       | 153 +++++++++++++++++++++++++------------------
 fs/gfs2/trace_gfs2.h |  10 +--
 fs/gfs2/trans.h      |   2 +-
 5 files changed, 100 insertions(+), 97 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 9a4a15d646ebb..7851540bd8916 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1497,7 +1497,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 
 			/* Must be done with the rgrp glock held: */
 			if (gfs2_rs_active(&ip->i_res) &&
-			    rgd == ip->i_res.rs_rbm.rgd)
+			    rgd == ip->i_res.rs_rgd)
 				gfs2_rs_deltree(&ip->i_res);
 		}
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 888b62cfd6d1a..aaab2af6a2d98 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -124,31 +124,6 @@ struct gfs2_rgrpd {
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
-struct gfs2_rbm {
-	struct gfs2_rgrpd *rgd;
-	u32 offset;		/* The offset is bitmap relative */
-	int bii;		/* Bitmap index */
-};
-
-static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
-{
-	return rbm->rgd->rd_bits + rbm->bii;
-}
-
-static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
-{
-	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
-	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
-		rbm->offset;
-}
-
-static inline bool gfs2_rbm_eq(const struct gfs2_rbm *rbm1,
-			       const struct gfs2_rbm *rbm2)
-{
-	return (rbm1->rgd == rbm2->rgd) && (rbm1->bii == rbm2->bii) &&
-	       (rbm1->offset == rbm2->offset);
-}
-
 enum gfs2_state_bits {
 	BH_Pinned = BH_PrivateStart,
 	BH_Escaped = BH_PrivateStart + 1,
@@ -309,8 +284,9 @@ struct gfs2_qadata { /* quota allocation data */
 */
 
 struct gfs2_blkreserv {
-	struct rb_node rs_node;       /* link to other block reservations */
-	struct gfs2_rbm rs_rbm;       /* Start of reservation */
+	struct rb_node rs_node;       /* node within rd_rstree */
+	struct gfs2_rgrpd *rs_rgd;
+	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
 };
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index b08a530433adf..bbcf8b8b2597f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -49,6 +49,24 @@
 #define LBITSKIP00 (0x0000000000000000UL)
 #endif
 
+struct gfs2_rbm {
+	struct gfs2_rgrpd *rgd;
+	u32 offset;		/* The offset is bitmap relative */
+	int bii;		/* Bitmap index */
+};
+
+static inline struct gfs2_bitmap *rbm_bi(const struct gfs2_rbm *rbm)
+{
+	return rbm->rgd->rd_bits + rbm->bii;
+}
+
+static inline u64 gfs2_rbm_to_block(const struct gfs2_rbm *rbm)
+{
+	BUG_ON(rbm->offset >= rbm->rgd->rd_data);
+	return rbm->rgd->rd_data0 + (rbm_bi(rbm)->bi_start * GFS2_NBBY) +
+		rbm->offset;
+}
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -188,7 +206,7 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
 
 /**
  * rs_cmp - multi-block reservation range compare
- * @blk: absolute file system block number of the new reservation
+ * @start: start of the new reservation
  * @len: number of blocks in the new reservation
  * @rs: existing reservation to compare against
  *
@@ -196,13 +214,11 @@ static inline u64 gfs2_bit_search(const __le64 *ptr, u64 mask, u8 state)
  *         -1 if the block range is before the start of the reservation
  *          0 if the block range overlaps with the reservation
  */
-static inline int rs_cmp(u64 blk, u32 len, struct gfs2_blkreserv *rs)
+static inline int rs_cmp(u64 start, u32 len, struct gfs2_blkreserv *rs)
 {
-	u64 startblk = gfs2_rbm_to_block(&rs->rs_rbm);
-
-	if (blk >= startblk + rs->rs_free)
+	if (start >= rs->rs_start + rs->rs_free)
 		return 1;
-	if (blk + len - 1 < startblk)
+	if (rs->rs_start >= start + len)
 		return -1;
 	return 0;
 }
@@ -315,13 +331,28 @@ static bool gfs2_rbm_incr(struct gfs2_rbm *rbm)
 	return false;
 }
 
+static struct gfs2_bitmap *gfs2_block_to_bitmap(struct gfs2_rgrpd *rgd,
+						u64 block)
+{
+	unsigned int delta = (sizeof(struct gfs2_rgrp) -
+			      sizeof(struct gfs2_meta_header)) * GFS2_NBBY;
+	unsigned int rblock, bii;
+
+	if (!rgrp_contains_block(rgd, block))
+		return NULL;
+	rblock = block - rgd->rd_data0;
+	bii = (rblock + delta) / rgd->rd_sbd->sd_blocks_per_bitmap;
+	return rgd->rd_bits + bii;
+}
+
 /**
  * gfs2_unaligned_extlen - Look for free blocks which are not byte aligned
  * @rbm: Position to search (value/result)
  * @n_unaligned: Number of unaligned blocks to check
  * @len: Decremented for each block found (terminate on zero)
  *
- * Returns: true if a non-free block is encountered
+ * Returns: true if a non-free block is encountered or the end of the resource
+ *	    group is reached.
  */
 
 static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *len)
@@ -617,10 +648,10 @@ static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs)
 {
 	struct gfs2_inode *ip = container_of(rs, struct gfs2_inode, i_res);
 
-	gfs2_print_dbg(seq, "  B: n:%llu s:%llu b:%u f:%u\n",
+	gfs2_print_dbg(seq, "  B: n:%llu s:%llu f:%u\n",
 		       (unsigned long long)ip->i_no_addr,
-		       (unsigned long long)gfs2_rbm_to_block(&rs->rs_rbm),
-		       rs->rs_rbm.offset, rs->rs_free);
+		       (unsigned long long)rs->rs_start,
+		       rs->rs_free);
 }
 
 /**
@@ -635,30 +666,28 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	if (!gfs2_rs_active(rs))
 		return;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	trace_gfs2_rs(rs, TRACE_RS_TREEDEL);
 	rb_erase(&rs->rs_node, &rgd->rd_rstree);
 	RB_CLEAR_NODE(&rs->rs_node);
 
 	if (rs->rs_free) {
-		u64 last_block = gfs2_rbm_to_block(&rs->rs_rbm) +
-				 rs->rs_free - 1;
-		struct gfs2_rbm last_rbm = { .rgd = rs->rs_rbm.rgd, };
+		u64 last_block = rs->rs_start + rs->rs_free - 1;
 		struct gfs2_bitmap *start, *last;
 
 		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free);
-		rs->rs_rbm.rgd->rd_reserved -= rs->rs_free;
+		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
+		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
 		   it will force the number to be recalculated later. */
 		rgd->rd_extfail_pt += rs->rs_free;
 		rs->rs_free = 0;
-		if (gfs2_rbm_from_block(&last_rbm, last_block))
+		start = gfs2_block_to_bitmap(rgd, rs->rs_start);
+		last = gfs2_block_to_bitmap(rgd, last_block);
+		if (!start || !last)
 			return;
-		start = rbm_bi(&rs->rs_rbm);
-		last = rbm_bi(&last_rbm);
 		do
 			clear_bit(GBF_FULL, &start->bi_flags);
 		while (start++ != last);
@@ -674,7 +703,7 @@ void gfs2_rs_deltree(struct gfs2_blkreserv *rs)
 {
 	struct gfs2_rgrpd *rgd;
 
-	rgd = rs->rs_rbm.rgd;
+	rgd = rs->rs_rgd;
 	if (rgd) {
 		spin_lock(&rgd->rd_rsspin);
 		__rs_deltree(rs);
@@ -1492,8 +1521,7 @@ static void rs_insert(struct gfs2_inode *ip)
 	struct rb_node **newn, *parent = NULL;
 	int rc;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	struct gfs2_rgrpd *rgd = rs->rs_rbm.rgd;
-	u64 fsblock = gfs2_rbm_to_block(&rs->rs_rbm);
+	struct gfs2_rgrpd *rgd = rs->rs_rgd;
 
 	BUG_ON(gfs2_rs_active(rs));
 
@@ -1504,7 +1532,7 @@ static void rs_insert(struct gfs2_inode *ip)
 			rb_entry(*newn, struct gfs2_blkreserv, rs_node);
 
 		parent = *newn;
-		rc = rs_cmp(fsblock, rs->rs_free, cur);
+		rc = rs_cmp(rs->rs_start, rs->rs_free, cur);
 		if (rc > 0)
 			newn = &((*newn)->rb_right);
 		else if (rc < 0)
@@ -1592,7 +1620,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 
 	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true);
 	if (ret == 0) {
-		rs->rs_rbm = rbm;
+		rs->rs_start = gfs2_rbm_to_block(&rbm);
 		rs->rs_free = extlen;
 		rs_insert(ip);
 	} else {
@@ -1637,7 +1665,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 
 	if (n) {
 		while ((rs_cmp(block, length, rs) == 0) && (&ip->i_res != rs)) {
-			block = gfs2_rbm_to_block(&rs->rs_rbm) + rs->rs_free;
+			block = rs->rs_start + rs->rs_free;
 			n = n->rb_right;
 			if (n == NULL)
 				break;
@@ -1978,7 +2006,7 @@ static bool gfs2_rgrp_used_recently(const struct gfs2_blkreserv *rs,
 	u64 tdiff;
 
 	tdiff = ktime_to_ns(ktime_sub(ktime_get_real(),
-                            rs->rs_rbm.rgd->rd_gl->gl_dstamp));
+                            rs->rs_rgd->rd_gl->gl_dstamp));
 
 	return tdiff > (msecs * 1000 * 1000);
 }
@@ -2056,45 +2084,45 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	if (gfs2_assert_warn(sdp, ap->target))
 		return -EINVAL;
 	if (gfs2_rs_active(rs)) {
-		begin = rs->rs_rbm.rgd;
-	} else if (rs->rs_rbm.rgd &&
-		   rgrp_contains_block(rs->rs_rbm.rgd, ip->i_goal)) {
-		begin = rs->rs_rbm.rgd;
+		begin = rs->rs_rgd;
+	} else if (rs->rs_rgd &&
+		   rgrp_contains_block(rs->rs_rgd, ip->i_goal)) {
+		begin = rs->rs_rgd;
 	} else {
 		check_and_update_goal(ip);
-		rs->rs_rbm.rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
+		rs->rs_rgd = begin = gfs2_blk2rgrpd(sdp, ip->i_goal, 1);
 	}
 	if (S_ISDIR(ip->i_inode.i_mode) && (ap->aflags & GFS2_AF_ORLOV))
 		skip = gfs2_orlov_skip(ip);
-	if (rs->rs_rbm.rgd == NULL)
+	if (rs->rs_rgd == NULL)
 		return -EBADSLT;
 
 	while (loops < 3) {
 		rg_locked = 1;
 
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rbm.rgd->rd_gl)) {
+		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
 			rg_locked = 0;
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
 				if (loops == 0 &&
-				    !fast_to_acquire(rs->rs_rbm.rgd))
+				    !fast_to_acquire(rs->rs_rgd))
 					goto next_rgrp;
 				if ((loops < 2) &&
 				    gfs2_rgrp_used_recently(rs, 1000) &&
-				    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+				    gfs2_rgrp_congested(rs->rs_rgd, loops))
 					goto next_rgrp;
 			}
-			error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
+			error = gfs2_glock_nq_init(rs->rs_rgd->rd_gl,
 						   LM_ST_EXCLUSIVE, flags,
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
-			    gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
-				error = update_rgrp_lvb(rs->rs_rbm.rgd);
+				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
@@ -2103,24 +2131,24 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 
 		/* Skip unusable resource groups */
-		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
+		if ((rs->rs_rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
-		    (loops == 0 && ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
+		    (loops == 0 && ap->target > rs->rs_rgd->rd_extfail_pt))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
-			gfs2_rgrp_bh_get(rs->rs_rbm.rgd);
+			gfs2_rgrp_bh_get(rs->rs_rgd);
 
 		/* Get a reservation if we don't already have one */
 		if (!gfs2_rs_active(rs))
-			rg_mblk_search(rs->rs_rbm.rgd, ip, ap);
+			rg_mblk_search(rs->rs_rgd, ip, ap);
 
 		/* Skip rgrps when we can't get a reservation on first pass */
 		if (!gfs2_rs_active(rs) && (loops < 1))
 			goto check_rgrp;
 
 		/* If rgrp has enough free space, use it */
-		free_blocks = rgd_free(rs->rs_rbm.rgd, rs);
+		free_blocks = rgd_free(rs->rs_rgd, rs);
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
@@ -2129,8 +2157,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		}
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
-		if (rs->rs_rbm.rgd->rd_flags & GFS2_RDF_CHECK)
-			try_rgrp_unlink(rs->rs_rbm.rgd, &last_unlinked,
+		if (rs->rs_rgd->rd_flags & GFS2_RDF_CHECK)
+			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
 		/* Drop reservation, if we couldn't use reserved rgrp */
@@ -2142,7 +2170,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 next_rgrp:
 		/* Find the next rgrp, and continue looking */
-		if (gfs2_select_rgrp(&rs->rs_rbm.rgd, begin))
+		if (gfs2_select_rgrp(&rs->rs_rgd, begin))
 			continue;
 		if (skip)
 			continue;
@@ -2310,20 +2338,21 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 {
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
-	unsigned rlen;
-	u64 block;
-	int ret;
 
 	spin_lock(&rgd->rd_rsspin);
 	if (gfs2_rs_active(rs)) {
-		if (gfs2_rbm_eq(&rs->rs_rbm, rbm)) {
-			block = gfs2_rbm_to_block(rbm);
-			ret = gfs2_rbm_from_block(&rs->rs_rbm, block + len);
+		u64 start = gfs2_rbm_to_block(rbm);
+
+		if (rs->rs_start == start) {
+			unsigned int rlen;
+
+			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
 			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
-			if (rs->rs_free && !ret)
+			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
+			    rs->rs_free)
 				goto out;
 			/* We used up our block reservation, so we should
 			   reserve more blocks next time. */
@@ -2352,15 +2381,13 @@ static void gfs2_set_alloc_start(struct gfs2_rbm *rbm,
 	u64 goal;
 
 	if (gfs2_rs_active(&ip->i_res)) {
-		*rbm = ip->i_res.rs_rbm;
-		return;
+		goal = ip->i_res.rs_start;
+	} else {
+		if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
+			goal = ip->i_goal;
+		else
+			goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
 	}
-
-	if (!dinode && rgrp_contains_block(rbm->rgd, ip->i_goal))
-		goal = ip->i_goal;
-	else
-		goal = rbm->rgd->rd_last_alloc + rbm->rgd->rd_data0;
-
 	if (WARN_ON_ONCE(gfs2_rbm_from_block(rbm, goal))) {
 		rbm->bii = 0;
 		rbm->offset = 0;
@@ -2383,7 +2410,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct buffer_head *dibh;
-	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rbm.rgd, };
+	struct gfs2_rbm rbm = { .rgd = ip->i_res.rs_rgd, };
 	unsigned int ndata;
 	u64 block; /* block, within the file system scope */
 	int error;
@@ -2615,7 +2642,7 @@ void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 			return;
 		rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	} else {
-		rgd = ip->i_res.rs_rbm.rgd;
+		rgd = ip->i_res.rs_rgd;
 		if (!rgd || !rgrp_contains_block(rgd, block))
 			rgd = gfs2_blk2rgrpd(sdp, block, 1);
 	}
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index e0025258107a8..7586c7629497f 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -602,13 +602,13 @@ TRACE_EVENT(gfs2_rs,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= rs->rs_rbm.rgd->rd_sbd->sd_vfs->s_dev;
-		__entry->rd_addr	= rs->rs_rbm.rgd->rd_addr;
-		__entry->rd_free_clone	= rs->rs_rbm.rgd->rd_free_clone;
-		__entry->rd_reserved	= rs->rs_rbm.rgd->rd_reserved;
+		__entry->dev		= rs->rs_rgd->rd_sbd->sd_vfs->s_dev;
+		__entry->rd_addr	= rs->rs_rgd->rd_addr;
+		__entry->rd_free_clone	= rs->rs_rgd->rd_free_clone;
+		__entry->rd_reserved	= rs->rs_rgd->rd_reserved;
 		__entry->inum		= container_of(rs, struct gfs2_inode,
 						       i_res)->i_no_addr;
-		__entry->start		= gfs2_rbm_to_block(&rs->rs_rbm);
+		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
 		__entry->func		= func;
 	),
diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h
index ad70087d05971..a27078f98e7a9 100644
--- a/fs/gfs2/trans.h
+++ b/fs/gfs2/trans.h
@@ -30,7 +30,7 @@ struct gfs2_glock;
  * block, or all of the blocks in the rg, whichever is smaller */
 static inline unsigned int gfs2_rg_blocks(const struct gfs2_inode *ip, unsigned requested)
 {
-	struct gfs2_rgrpd *rgd = ip->i_res.rs_rbm.rgd;
+	struct gfs2_rgrpd *rgd = ip->i_res.rs_rgd;
 
 	if (requested < rgd->rd_length)
 		return requested + 1;
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 1/6] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  2018-12-03 14:32   ` Bob Peterson
  2018-12-04 22:02   ` Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 3/6] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Unconditionally call gfs2_adjust_reservation in gfs2_alloc_blocks.  Move
the code for updating rd_free and rd_free_clone from gfs2_alloc_blocks
into gfs2_adjust_reservation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/rgrp.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index bbcf8b8b2597f..733e21cd4cf25 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2333,13 +2333,23 @@ static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
  * then it is removed.
  */
 
-static void gfs2_adjust_reservation(struct gfs2_inode *ip,
-				    const struct gfs2_rbm *rbm, unsigned len)
+static int gfs2_adjust_reservation(struct gfs2_inode *ip,
+				   const struct gfs2_rbm *rbm, unsigned len)
 {
+	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_blkreserv *rs = &ip->i_res;
 	struct gfs2_rgrpd *rgd = rbm->rgd;
+	int error = 0;
 
 	spin_lock(&rgd->rd_rsspin);
+	if (unlikely(rgd->rd_free < len || rgd->rd_free_clone < len)) {
+		fs_warn(sdp, "rgrp free block accounting error (%u %u %u)\n",
+			rgd->rd_free, rgd->rd_free_clone, len);
+		error = -EIO;
+		goto out;
+	}
+	rgd->rd_free -= len;
+	rgd->rd_free_clone -= len;
 	if (gfs2_rs_active(rs)) {
 		u64 start = gfs2_rbm_to_block(rbm);
 
@@ -2362,6 +2372,7 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 	}
 out:
 	spin_unlock(&rgd->rd_rsspin);
+	return error;
 }
 
 /**
@@ -2435,8 +2446,9 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	gfs2_alloc_extent(&rbm, dinode, nblocks);
 	block = gfs2_rbm_to_block(&rbm);
 	rbm.rgd->rd_last_alloc = block - rbm.rgd->rd_data0;
-	if (gfs2_rs_active(&ip->i_res))
-		gfs2_adjust_reservation(ip, &rbm, *nblocks);
+	error = gfs2_adjust_reservation(ip, &rbm, *nblocks);
+	if (error)
+		goto rgrp_error;
 	ndata = *nblocks;
 	if (dinode)
 		ndata--;
@@ -2453,12 +2465,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 			brelse(dibh);
 		}
 	}
-	if (rbm.rgd->rd_free < *nblocks) {
-		fs_warn(sdp, "nblocks=%u\n", *nblocks);
-		goto rgrp_error;
-	}
 
-	rbm.rgd->rd_free -= *nblocks;
 	if (dinode) {
 		rbm.rgd->rd_dinodes++;
 		*generation = rbm.rgd->rd_igeneration++;
@@ -2475,7 +2482,6 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_quota_change(ip, *nblocks, ip->i_inode.i_uid, ip->i_inode.i_gid);
 
-	rbm.rgd->rd_free_clone -= *nblocks;
 	trace_gfs2_block_alloc(ip, rbm.rgd, block, *nblocks,
 			       dinode ? GFS2_BLKST_DINODE : GFS2_BLKST_USED);
 	*bn = block;
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 3/6] gfs2: Add per-reservation reserved block accounting
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 1/6] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 4/6] gfs2: Add local resource group locking Andreas Gruenbacher
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the
number of blocks reserved by this particular reservation.  When making a
reservation with gfs2_inplace_reserve, this field is set to somewhere
between ap->min_target and ap->target depending on the number of free
blocks in the resource group.  When allocating blocks with
gfs2_alloc_blocks, rs_reserved is decremented accordingly.  Eventually,
any reserved but not consumed blocks are returned to the resource group
by gfs2_inplace_release (via gfs2_adjust_reservation).

The reservation tree (rd_rstree) is unaffected by this change: the
reservations it tracks are still advisory, and the sizes of those
reservations (rs_free) are still determined by the tentative allocation
sizes (i_sizehint).  Since rd_reserved now tracks the number of reserved
blocks rather than the number of tentatively rd_reserved blocks, we may
end up with slightly different allocation patterns, though. The
rd_extfail_pt optimization will still cause ill-suited resource groups
to be skipped quickly.

We expect to augment this with a patch that will reserve an extent of
blocks rather than just reserving a number of blocks in
gfs2_inplace_reserve.  gfs2_alloc_blocks will then be able to consume
that reserved extent before scanning for additional available blocks;
this should eliminate double bitmap scanning in most cases.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c       |  4 +--
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/rgrp.c       | 73 ++++++++++++++++++++++++++------------------
 fs/gfs2/trace_gfs2.h |  8 +++--
 4 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 45a17b770d97d..0106abe8c6ee0 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1022,8 +1022,8 @@ static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t
 			goto out_qunlock;
 
 		/* check if the selected rgrp limits our max_blks further */
-		if (ap.allowed && ap.allowed < max_blks)
-			max_blks = ap.allowed;
+		if (ip->i_res.rs_reserved < max_blks)
+			max_blks = ip->i_res.rs_reserved;
 
 		/* Almost done. Calculate bytes that can be written using
 		 * max_blks. We also recompute max_bytes, data_blocks and
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index aaab2af6a2d98..ca25043fc26df 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -288,6 +288,7 @@ struct gfs2_blkreserv {
 	struct gfs2_rgrpd *rs_rgd;
 	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
+	u32 rs_reserved;              /* number of reserved blocks */
 };
 
 /*
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 733e21cd4cf25..1f427459a584d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -675,9 +675,6 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 		u64 last_block = rs->rs_start + rs->rs_free - 1;
 		struct gfs2_bitmap *start, *last;
 
-		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
-		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
@@ -1546,39 +1543,31 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-
-	/* Do our rgrp accounting for the reservation */
-	rgd->rd_reserved += rs->rs_free; /* blocks reserved */
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
 
 /**
- * rgd_free - return the number of free blocks we can allocate.
+ * rgd_free - compute the number of blocks we can allocate
  * @rgd: the resource group
  *
- * This function returns the number of free blocks for an rgrp.
- * That's the clone-free blocks (blocks that are free, not including those
- * still being used for unlinked files that haven't been deleted.)
- *
- * It also subtracts any blocks reserved by someone else, but does not
- * include free blocks that are still part of our current reservation,
- * because obviously we can (and will) allocate them.
+ * Compute the number of blocks we can allocate in @rgd.  That's the clone-free
+ * blocks (blocks that are free, not including those still being used for
+ * unlinked files that haven't been deleted) minus the blocks currently
+ * reserved by any reservations other than @rs.
  */
 static inline u32 rgd_free(struct gfs2_rgrpd *rgd, struct gfs2_blkreserv *rs)
 {
-	u32 tot_reserved, tot_free;
-
-	if (WARN_ON_ONCE(rgd->rd_reserved < rs->rs_free))
-		return 0;
-	tot_reserved = rgd->rd_reserved - rs->rs_free;
-
-	if (rgd->rd_free_clone < tot_reserved)
-		tot_reserved = 0;
+	u32 free = 0;
 
-	tot_free = rgd->rd_free_clone - tot_reserved;
-
-	return tot_free;
+	spin_lock(&rgd->rd_rsspin);
+	if (!WARN_ON_ONCE(rgd->rd_free_clone < rgd->rd_reserved)) {
+		free = rgd->rd_free_clone - rgd->rd_reserved;
+		if (rgd == rs->rs_rgd)
+			free += rs->rs_reserved;
+	}
+	spin_unlock(&rgd->rd_rsspin);
+	return free;
 }
 
 /**
@@ -1606,7 +1595,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 		extlen = max_t(u32, atomic_read(&ip->i_sizehint), ap->target);
 		extlen = clamp(extlen, (u32)RGRP_RSRV_MINBLKS, free_blocks);
 	}
-	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
+	if (free_blocks < extlen)
 		return;
 
 	/* Find bitmap block that contains bits for goal block */
@@ -2061,8 +2050,7 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
  * We try our best to find an rgrp that has@least ap->target blocks
  * available. After a couple of passes (loops == 2), the prospects of finding
  * such an rgrp diminish. At this stage, we return the first rgrp that has
- * at least ap->min_target blocks available. Either way, we set ap->allowed to
- * the number of blocks available in the chosen rgrp.
+ * at least ap->min_target blocks available.
  *
  * Returns: 0 on success,
  *          -ENOMEM if a suitable rgrp can't be found
@@ -2079,6 +2067,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	int loops = 0;
 	u32 free_blocks, skip = 0;
 
+	BUG_ON(rs->rs_reserved);
+
 	if (sdp->sd_args.ar_rgrplvb)
 		flags |= GL_SKIP;
 	if (gfs2_assert_warn(sdp, ap->target))
@@ -2152,7 +2142,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
-			ap->allowed = free_blocks;
+			struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+			rs->rs_reserved = ap->target;
+			if (rs->rs_reserved > free_blocks)
+				rs->rs_reserved = free_blocks;
+			spin_lock(&rs->rs_rgd->rd_rsspin);
+			rgd->rd_reserved += rs->rs_reserved;
+			spin_unlock(&rs->rs_rgd->rd_rsspin);
 			return 0;
 		}
 check_rgrp:
@@ -2204,6 +2201,17 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 void gfs2_inplace_release(struct gfs2_inode *ip)
 {
+	struct gfs2_blkreserv *rs = &ip->i_res;
+
+	if (rs->rs_reserved) {
+		struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+		spin_lock(&rgd->rd_rsspin);
+		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
+		rgd->rd_reserved -= rs->rs_reserved;
+		spin_unlock(&rs->rs_rgd->rd_rsspin);
+		rs->rs_reserved = 0;
+	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
 		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 }
@@ -2341,6 +2349,8 @@ static int gfs2_adjust_reservation(struct gfs2_inode *ip,
 	struct gfs2_rgrpd *rgd = rbm->rgd;
 	int error = 0;
 
+	BUG_ON(rs->rs_reserved < len);
+
 	spin_lock(&rgd->rd_rsspin);
 	if (unlikely(rgd->rd_free < len || rgd->rd_free_clone < len)) {
 		fs_warn(sdp, "rgrp free block accounting error (%u %u %u)\n",
@@ -2350,6 +2360,8 @@ static int gfs2_adjust_reservation(struct gfs2_inode *ip,
 	}
 	rgd->rd_free -= len;
 	rgd->rd_free_clone -= len;
+	rgd->rd_reserved -= len;
+	rs->rs_reserved -= len;
 	if (gfs2_rs_active(rs)) {
 		u64 start = gfs2_rbm_to_block(rbm);
 
@@ -2359,7 +2371,6 @@ static int gfs2_adjust_reservation(struct gfs2_inode *ip,
 			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
-			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
@@ -2426,6 +2437,8 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	u64 block; /* block, within the file system scope */
 	int error;
 
+	BUG_ON(ip->i_res.rs_reserved < *nblocks);
+
 	gfs2_set_alloc_start(&rbm, ip, dinode);
 	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
 
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 7586c7629497f..282fcb1a242f9 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -598,6 +598,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
 		__field(	u32,	free			)
+		__field(	u32,	reserved		)
 		__field(	u8,	func			)
 	),
 
@@ -610,17 +611,20 @@ TRACE_EVENT(gfs2_rs,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
+		__entry->reserved	= rs->rs_reserved;
 		__entry->func		= func;
 	),
 
-	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu",
+	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu r:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long long)__entry->rd_addr,
 		  (unsigned long)__entry->rd_free_clone,
 		  (unsigned long)__entry->rd_reserved,
-		  rs_func_name(__entry->func), (unsigned long)__entry->free)
+		  rs_func_name(__entry->func),
+		  (unsigned long)__entry->free,
+		  (unsigned long)__entry->reserved)
 );
 
 #endif /* _TRACE_GFS2_H */
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 4/6] gfs2: Add local resource group locking
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (2 preceding siblings ...)
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 3/6] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 5/6] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 6/6] gfs2: Introduce resource group sharing Andreas Gruenbacher
  5 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

Prepare for treating resource group glocks as exclusive among nodes but
shared among all tasks running on a node: introduce another layer of
node-specific locking that the local tasks can use to coordinate their
accesses.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/incore.h |  2 ++
 fs/gfs2/lops.c   |  5 +++-
 fs/gfs2/rgrp.c   | 74 ++++++++++++++++++++++++++++++++++++++++++------
 fs/gfs2/rgrp.h   |  4 +++
 4 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index ca25043fc26df..feba57a2a6bab 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -23,6 +23,7 @@
 #include <linux/percpu.h>
 #include <linux/lockref.h>
 #include <linux/rhashtable.h>
+#include <linux/mutex.h>
 
 #define DIO_WAIT	0x00000010
 #define DIO_METADATA	0x00000020
@@ -121,6 +122,7 @@ struct gfs2_rgrpd {
 #define GFS2_RDF_PREFERRED	0x80000000 /* This rgrp is preferred */
 #define GFS2_RDF_MASK		0xf0000000 /* mask for internal flags */
 	spinlock_t rd_rsspin;           /* protects reservation related vars */
+	struct mutex rd_mutex;
 	struct rb_root rd_rstree;       /* multi-block reservation tree */
 };
 
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 4c7069b8f3c1d..a9e858e01c97f 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -76,8 +76,9 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	unsigned int index = bd->bd_bh->b_blocknr - gl->gl_name.ln_number;
 	struct gfs2_bitmap *bi = rgd->rd_bits + index;
 
+	rgrp_lock_local(rgd);
 	if (bi->bi_clone == NULL)
-		return;
+		goto out;
 	if (sdp->sd_args.ar_discard)
 		gfs2_rgrp_send_discards(sdp, rgd->rd_data0, bd->bd_bh, bi, 1, NULL);
 	memcpy(bi->bi_clone + bi->bi_offset,
@@ -85,6 +86,8 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	clear_bit(GBF_FULL, &bi->bi_flags);
 	rgd->rd_free_clone = rgd->rd_free;
 	rgd->rd_extfail_pt = rgd->rd_free;
+out:
+	rgrp_unlock_local(rgd);
 }
 
 /**
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1f427459a584d..95ecd81e61e70 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -952,6 +952,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
 	rgd->rd_data = be32_to_cpu(buf.ri_data);
 	rgd->rd_bitbytes = be32_to_cpu(buf.ri_bitbytes);
 	spin_lock_init(&rgd->rd_rsspin);
+	mutex_init(&rgd->rd_mutex);
 
 	error = compute_bitstructs(rgd);
 	if (error)
@@ -1472,9 +1473,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			/* Trim each bitmap in the rgrp */
 			for (x = 0; x < rgd->rd_length; x++) {
 				struct gfs2_bitmap *bi = rgd->rd_bits + x;
+				rgrp_lock_local(rgd);
 				ret = gfs2_rgrp_send_discards(sdp,
 						rgd->rd_data0, NULL, bi, minlen,
 						&amt);
+				rgrp_unlock_local(rgd);
 				if (ret) {
 					gfs2_glock_dq_uninit(&gh);
 					goto out;
@@ -1486,9 +1489,11 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 			ret = gfs2_trans_begin(sdp, RES_RG_HDR, 0);
 			if (ret == 0) {
 				bh = rgd->rd_bits[0].bi_bh;
+				rgrp_lock_local(rgd);
 				rgd->rd_flags |= GFS2_RGF_TRIMMED;
 				gfs2_trans_add_meta(rgd->rd_gl, bh);
 				gfs2_rgrp_out(rgd, bh->b_data);
+				rgrp_unlock_local(rgd);
 				gfs2_trans_end(sdp);
 			}
 		}
@@ -1864,7 +1869,22 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 	struct gfs2_rbm rbm = { .rgd = rgd, .bii = 0, .offset = 0 };
 
 	while (1) {
+		/*
+		 * We must be careful to avoid deadlock here:
+		 * All transactions expect: sd_log_flush_lock followed by rgrp
+		 * ex (if needed), but try_rgrp_unlink takes sd_log_flush_lock
+		 * outside a transaction and therefore must not have the rgrp
+		 * ex already held. To avoid deadlock, we drop the rgrp ex lock
+		 * before taking the log_flush_lock, then reacquire it to
+		 * protect our call to gfs2_rbm_find.
+		 *
+		 * Also note that rgrp_unlock_local must come AFTER the caller does
+		 * gfs2_rs_deltree because rgrp ex needs to be held before
+		 * making reservations.
+		 */
+		rgrp_unlock_local(rgd);
 		down_write(&sdp->sd_log_flush_lock);
+		rgrp_lock_local(rgd);
 		error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, NULL, NULL,
 				      true);
 		up_write(&sdp->sd_log_flush_lock);
@@ -2062,7 +2082,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrpd *begin = NULL;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	int error = 0, rg_locked, flags = 0;
+	int error = 0, flags = 0;
+	bool rg_locked;
 	u64 last_unlinked = NO_BLOCK;
 	int loops = 0;
 	u32 free_blocks, skip = 0;
@@ -2088,10 +2109,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		return -EBADSLT;
 
 	while (loops < 3) {
-		rg_locked = 1;
-
-		if (!gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl)) {
-			rg_locked = 0;
+		rg_locked = gfs2_glock_is_locked_by_me(rs->rs_rgd->rd_gl);
+		if (rg_locked) {
+			rgrp_lock_local(rs->rs_rgd);
+		} else {
 			if (skip && skip--)
 				goto next_rgrp;
 			if (!gfs2_rs_active(rs)) {
@@ -2108,12 +2129,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 						   &ip->i_rgd_gh);
 			if (unlikely(error))
 				return error;
+			rgrp_lock_local(rs->rs_rgd);
 			if (!gfs2_rs_active(rs) && (loops < 2) &&
 			    gfs2_rgrp_congested(rs->rs_rgd, loops))
 				goto skip_rgrp;
 			if (sdp->sd_args.ar_rgrplvb) {
 				error = update_rgrp_lvb(rs->rs_rgd);
 				if (unlikely(error)) {
+					rgrp_unlock_local(rs->rs_rgd);
 					gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 					return error;
 				}
@@ -2147,9 +2170,10 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			rs->rs_reserved = ap->target;
 			if (rs->rs_reserved > free_blocks)
 				rs->rs_reserved = free_blocks;
-			spin_lock(&rs->rs_rgd->rd_rsspin);
+			spin_lock(&rgd->rd_rsspin);
 			rgd->rd_reserved += rs->rs_reserved;
-			spin_unlock(&rs->rs_rgd->rd_rsspin);
+			spin_unlock(&rgd->rd_rsspin);
+			rgrp_unlock_local(rs->rs_rgd);
 			return 0;
 		}
 check_rgrp:
@@ -2158,6 +2182,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 			try_rgrp_unlink(rs->rs_rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
+		rgrp_unlock_local(rs->rs_rgd);
+
 		/* Drop reservation, if we couldn't use reserved rgrp */
 		if (gfs2_rs_active(rs))
 			gfs2_rs_deltree(rs);
@@ -2209,7 +2235,7 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 		spin_lock(&rgd->rd_rsspin);
 		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
 		rgd->rd_reserved -= rs->rs_reserved;
-		spin_unlock(&rs->rs_rgd->rd_rsspin);
+		spin_unlock(&rgd->rd_rsspin);
 		rs->rs_reserved = 0;
 	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
@@ -2300,6 +2326,7 @@ void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 
 	if (rgd == NULL)
 		return;
+	spin_lock(&rgd->rd_rsspin);
 	gfs2_print_dbg(seq, " R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
 		       (unsigned long long)rgd->rd_addr, rgd->rd_flags,
 		       rgd->rd_free, rgd->rd_free_clone, rgd->rd_dinodes,
@@ -2312,7 +2339,6 @@ void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 			       be32_to_cpu(rgl->rl_free),
 			       be32_to_cpu(rgl->rl_dinodes));
 	}
-	spin_lock(&rgd->rd_rsspin);
 	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
 		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
 		dump_rs(seq, trs);
@@ -2439,6 +2465,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	BUG_ON(ip->i_res.rs_reserved < *nblocks);
 
+	rgrp_lock_local(rbm.rgd);
 	gfs2_set_alloc_start(&rbm, ip, dinode);
 	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
 
@@ -2488,6 +2515,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	gfs2_trans_add_meta(rbm.rgd->rd_gl, rbm.rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rbm.rgd, rbm.rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rbm.rgd);
 
 	gfs2_statfs_change(sdp, 0, -(s64)*nblocks, dinode ? 1 : 0);
 	if (dinode)
@@ -2501,6 +2529,7 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	return 0;
 
 rgrp_error:
+	rgrp_unlock_local(rbm.rgd);
 	gfs2_rgrp_error(rbm.rgd);
 	return -EIO;
 }
@@ -2520,12 +2549,14 @@ void __gfs2_free_blocks(struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, bstart, blen, GFS2_BLKST_FREE);
 	trace_gfs2_block_alloc(ip, rgd, bstart, blen, GFS2_BLKST_FREE);
 	rgd->rd_free += blen;
 	rgd->rd_flags &= ~GFS2_RGF_TRIMMED;
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 
 	/* Directories keep their data in the metadata address space */
 	if (meta || ip->i_depth)
@@ -2561,17 +2592,20 @@ void gfs2_unlink_di(struct inode *inode)
 	rgd = gfs2_blk2rgrpd(sdp, blkno, true);
 	if (!rgd)
 		return;
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	trace_gfs2_block_alloc(ip, rgd, blkno, 1, GFS2_BLKST_UNLINKED);
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, 1);
+	rgrp_unlock_local(rgd);
 }
 
 void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
 
+	rgrp_lock_local(rgd);
 	rgblk_free(sdp, rgd, ip->i_no_addr, 1, GFS2_BLKST_FREE);
 	if (!rgd->rd_dinodes)
 		gfs2_consist_rgrpd(rgd);
@@ -2580,6 +2614,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 
 	gfs2_trans_add_meta(rgd->rd_gl, rgd->rd_bits[0].bi_bh);
 	gfs2_rgrp_out(rgd, rgd->rd_bits[0].bi_bh->b_data);
+	rgrp_unlock_local(rgd);
 	be32_add_cpu(&rgd->rd_rgl->rl_unlinked, -1);
 
 	gfs2_statfs_change(sdp, 0, +1, -1);
@@ -2594,6 +2629,10 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
  * @no_addr: The block number to check
  * @type: The block type we are looking for
  *
+ * The inode glock of @no_addr must be held.  The @type to check for is either
+ * GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; checking for type GFS2_BLKST_FREE
+ * or GFS2_BLKST_USED would make no sense.
+ *
  * Returns: 0 if the block type matches the expected type
  *          -ESTALE if it doesn't match
  *          or -ve errno if something went wrong while checking
@@ -2619,6 +2658,12 @@ int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
 	if (WARN_ON_ONCE(error))
 		goto fail;
 
+	/*
+	 * No need to take the local resource group lock here; the inode glock
+	 * of @no_addr provides the necessary synchronization in case the block
+	 * is an inode.  (In case the block is not an inode, the block type
+	 * will not match the @type we are looking for.)
+	 */
 	if (gfs2_testbit(&rbm, false) != type)
 		error = -ESTALE;
 
@@ -2741,3 +2786,14 @@ void gfs2_rlist_free(struct gfs2_rgrp_list *rlist)
 	}
 }
 
+void rgrp_lock_local(struct gfs2_rgrpd *rgd)
+{
+	BUG_ON(!gfs2_glock_is_held_excl(rgd->rd_gl) &&
+	       !test_bit(SDF_NORECOVERY, &rgd->rd_sbd->sd_flags));
+	mutex_lock(&rgd->rd_mutex);
+}
+
+void rgrp_unlock_local(struct gfs2_rgrpd *rgd)
+{
+	mutex_unlock(&rgd->rd_mutex);
+}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index b596c3d179888..33e52dab76efa 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -92,4 +92,8 @@ static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block)
 }
 
 extern void check_and_update_goal(struct gfs2_inode *ip);
+
+extern void rgrp_lock_local(struct gfs2_rgrpd *rgd);
+extern void rgrp_unlock_local(struct gfs2_rgrpd *rgd);
+
 #endif /* __RGRP_DOT_H__ */
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 5/6] gfs2: Allow node-wide exclusive glock sharing
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (3 preceding siblings ...)
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 4/6] gfs2: Add local resource group locking Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 6/6] gfs2: Introduce resource group sharing Andreas Gruenbacher
  5 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set as
well.  (From the point of view of other nodes, the lock is still held
exclusively.)

A future patch will start using this flag to improve performance with
rgrp sharing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 26 ++++++++++++++++++++++----
 fs/gfs2/glock.h |  6 ++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 05431324b262b..1c07a1819c1b0 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -279,10 +279,26 @@ void gfs2_glock_put(struct gfs2_glock *gl)
 
 static inline int may_grant(const struct gfs2_glock *gl, const struct gfs2_holder *gh)
 {
-	const struct gfs2_holder *gh_head = list_entry(gl->gl_holders.next, const struct gfs2_holder, gh_list);
-	if ((gh->gh_state == LM_ST_EXCLUSIVE ||
-	     gh_head->gh_state == LM_ST_EXCLUSIVE) && gh != gh_head)
-		return 0;
+	const struct gfs2_holder *gh_head = list_entry(gl->gl_holders.next,
+						       const struct gfs2_holder,
+						       gh_list);
+
+	if (gh != gh_head) {
+		/**
+		 * Here we make a special exception to grant holders who agree
+		 * to share the EX lock with other holders who also have the
+		 * bit set. If the original holder has the LM_FLAG_NODE_SCOPE bit
+		 * is set, we grant more holders with the bit set.
+		 */
+		if (gh_head->gh_state == LM_ST_EXCLUSIVE &&
+		    (gh_head->gh_flags & LM_FLAG_NODE_SCOPE) &&
+		    gh->gh_state == LM_ST_EXCLUSIVE &&
+		    (gh->gh_flags & LM_FLAG_NODE_SCOPE))
+			return 1;
+		if ((gh->gh_state == LM_ST_EXCLUSIVE ||
+		     gh_head->gh_state == LM_ST_EXCLUSIVE))
+			return 0;
+	}
 	if (gl->gl_state == gh->gh_state)
 		return 1;
 	if (gh->gh_flags & GL_EXACT)
@@ -1682,6 +1698,8 @@ static const char *hflags2str(char *buf, u16 flags, unsigned long iflags)
 		*p++ = 'A';
 	if (flags & LM_FLAG_PRIORITY)
 		*p++ = 'p';
+	if (flags & LM_FLAG_NODE_SCOPE)
+		*p++ = 'n';
 	if (flags & GL_ASYNC)
 		*p++ = 'a';
 	if (flags & GL_EXACT)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 5e12220cc0c24..3a3c1bd5b2d89 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -78,6 +78,11 @@ enum {
  * request and directly join the other shared lock.  A shared lock request
  * without the priority flag might be forced to wait until the deferred
  * requested had acquired and released the lock.
+ *
+ * LM_FLAG_NODE_SCOPE
+ * This holder agrees to share the lock within this node. In other words,
+ * the glock is held in EX mode according to DLM, but local holders on the
+ * same node can share it.
  */
 
 #define LM_FLAG_TRY		0x0001
@@ -85,6 +90,7 @@ enum {
 #define LM_FLAG_NOEXP		0x0004
 #define LM_FLAG_ANY		0x0008
 #define LM_FLAG_PRIORITY	0x0010
+#define LM_FLAG_NODE_SCOPE	0x0020
 #define GL_ASYNC		0x0040
 #define GL_EXACT		0x0080
 #define GL_SKIP			0x0100
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 6/6] gfs2: Introduce resource group sharing
  2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
                   ` (4 preceding siblings ...)
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 5/6] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
@ 2018-12-01 11:10 ` Andreas Gruenbacher
  5 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-01 11:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Bob Peterson <rpeterso@redhat.com>

This patch takes advantage of the new glock holder sharing feature for
resource groups.  We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/inode.c |  5 +++--
 fs/gfs2/rgrp.c  | 10 +++++-----
 fs/gfs2/super.c |  3 ++-
 fs/gfs2/xattr.c |  6 ++++--
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 7851540bd8916..f15d58975b62e 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1491,7 +1491,7 @@ static int sweep_bh_for_rgrps(struct gfs2_inode *ip, struct gfs2_holder *rd_gh,
 				goto out;
 			}
 			ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
-						 0, rd_gh);
+						 LM_FLAG_NODE_SCOPE, rd_gh);
 			if (ret)
 				goto out;
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 648f0ca1ad57e..f757c88b1e4c0 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1120,7 +1120,7 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
 	if (!rgd)
 		goto out_inodes;
 
-	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);
+	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, LM_FLAG_NODE_SCOPE, ghs + 2);
 
 
 	error = gfs2_glock_nq(ghs); /* parent */
@@ -1407,7 +1407,8 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		 */
 		nrgd = gfs2_blk2rgrpd(sdp, nip->i_no_addr, 1);
 		if (nrgd)
-			gfs2_holder_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + num_gh++);
+			gfs2_holder_init(nrgd->rd_gl, LM_ST_EXCLUSIVE,
+					 LM_FLAG_NODE_SCOPE, ghs + num_gh++);
 	}
 
 	for (x = 0; x < num_gh; x++) {
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 95ecd81e61e70..86fa25f1756b3 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1465,7 +1465,8 @@ int gfs2_fitrim(struct file *filp, void __user *argp)
 
 	while (1) {
 
-		ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+		ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+					 LM_FLAG_NODE_SCOPE, &gh);
 		if (ret)
 			goto out;
 
@@ -2082,7 +2083,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_rgrpd *begin = NULL;
 	struct gfs2_blkreserv *rs = &ip->i_res;
-	int error = 0, flags = 0;
+	int error = 0, flags = LM_FLAG_NODE_SCOPE;
 	bool rg_locked;
 	u64 last_unlinked = NO_BLOCK;
 	int loops = 0;
@@ -2761,9 +2762,8 @@ void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist)
 				      sizeof(struct gfs2_holder),
 				      GFP_NOFS | __GFP_NOFAIL);
 	for (x = 0; x < rlist->rl_rgrps; x++)
-		gfs2_holder_init(rlist->rl_rgd[x]->rd_gl,
-				LM_ST_EXCLUSIVE, 0,
-				&rlist->rl_ghs[x]);
+		gfs2_holder_init(rlist->rl_rgd[x]->rd_gl, LM_ST_EXCLUSIVE,
+				 LM_FLAG_NODE_SCOPE, &rlist->rl_ghs[x]);
 }
 
 /**
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163ff7cfd..3f43ac36b1f49 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1507,7 +1507,8 @@ static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
 		goto out_qs;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &gh);
 	if (error)
 		goto out_qs;
 
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 996c915a9c97e..dc45b0e13e08a 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -262,7 +262,8 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
 		return -EIO;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &rg_gh);
 	if (error)
 		return error;
 
@@ -1384,7 +1385,8 @@ static int ea_dealloc_block(struct gfs2_inode *ip)
 		return -EIO;
 	}
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &gh);
+	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE,
+				   LM_FLAG_NODE_SCOPE, &gh);
 	if (error)
 		return error;
 
-- 
2.19.1.546.g028f9c799.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation Andreas Gruenbacher
@ 2018-12-03 14:32   ` Bob Peterson
  2018-12-03 14:58     ` Andreas Gruenbacher
  2018-12-04 22:02   ` Andreas Gruenbacher
  1 sibling, 1 reply; 10+ messages in thread
From: Bob Peterson @ 2018-12-03 14:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Unconditionally call gfs2_adjust_reservation in gfs2_alloc_blocks.  Move
> the code for updating rd_free and rd_free_clone from gfs2_alloc_blocks
> into gfs2_adjust_reservation.
> 
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>  fs/gfs2/rgrp.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index bbcf8b8b2597f..733e21cd4cf25 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
(snip)
> -	if (gfs2_rs_active(&ip->i_res))
> -		gfs2_adjust_reservation(ip, &rbm, *nblocks);
> +	error = gfs2_adjust_reservation(ip, &rbm, *nblocks);

Wait. At what point do we require a reservation for all block allocations?
In today's code, a reservation is not required, so the check for
"if (gfs2_rs_active(&ip->i_res))" is necessary because you can't adjust a
reservation that's not active (in the tree)...being within the tree
determines whether the reservation is active or not.

Yes, I'm looking forward to the day when we require a reservation for all
block allocations as per my original version (this needs to happen to allow
rgrp sharing between processes.) Perhaps a future patch will add the
requirement, but at this point, there's no such requirement, which means
this patch will probably break during a git bisect, no?

We only get into this situation when all bitmaps are nearly full or fragmented
and therefore no reservation will fit the minimum requirements, but we still
need to allow small one-shot block allocations. So we need to do lots of
full-bitmap testing. Or else we need to add this patch only after the
patch that makes reservations mandatory.

Regards,

Bob Peterson



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation
  2018-12-03 14:32   ` Bob Peterson
@ 2018-12-03 14:58     ` Andreas Gruenbacher
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-03 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 3 Dec 2018 at 15:32, Bob Peterson <rpeterso@redhat.com> wrote:
>
> ----- Original Message -----
> > Unconditionally call gfs2_adjust_reservation in gfs2_alloc_blocks.  Move
> > the code for updating rd_free and rd_free_clone from gfs2_alloc_blocks
> > into gfs2_adjust_reservation.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > ---
> >  fs/gfs2/rgrp.c | 26 ++++++++++++++++----------
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> > index bbcf8b8b2597f..733e21cd4cf25 100644
> > --- a/fs/gfs2/rgrp.c
> > +++ b/fs/gfs2/rgrp.c
> (snip)
> > -     if (gfs2_rs_active(&ip->i_res))
> > -             gfs2_adjust_reservation(ip, &rbm, *nblocks);
> > +     error = gfs2_adjust_reservation(ip, &rbm, *nblocks);
>
> Wait. At what point do we require a reservation for all block allocations?
> In today's code, a reservation is not required, so the check for
> "if (gfs2_rs_active(&ip->i_res))" is necessary because you can't adjust a
> reservation that's not active (in the tree)...being within the tree
> determines whether the reservation is active or not.
>
> Yes, I'm looking forward to the day when we require a reservation for all
> block allocations as per my original version (this needs to happen to allow
> rgrp sharing between processes.) Perhaps a future patch will add the
> requirement, but at this point, there's no such requirement, which means
> this patch will probably break during a git bisect, no?

gfs2_adjust_reservation already checks if there is an active
reservation, so the duplicate check here is unnecessary.

> We only get into this situation when all bitmaps are nearly full or fragmented
> and therefore no reservation will fit the minimum requirements, but we still
> need to allow small one-shot block allocations. So we need to do lots of
> full-bitmap testing. Or else we need to add this patch only after the
> patch that makes reservations mandatory.
>
> Regards,
> Bob Peterson

Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation
  2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation Andreas Gruenbacher
  2018-12-03 14:32   ` Bob Peterson
@ 2018-12-04 22:02   ` Andreas Gruenbacher
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2018-12-04 22:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sat, 1 Dec 2018 at 12:10, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> Unconditionally call gfs2_adjust_reservation in gfs2_alloc_blocks.  Move
> the code for updating rd_free and rd_free_clone from gfs2_alloc_blocks
> into gfs2_adjust_reservation.

Let me change that description to:

Prepare gfs2_adjust_reservation for resource group glock sharing: Call
it from gfs2_alloc_blocks unconditionally.  Update rd_free and
rd_free_clone in gfs2_adjust_reservation under rd_rsspin: holding the
resource group glock alone will no longer be enough to keep the balance
of the resoruce group counters straight.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-04 22:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-01 11:10 [Cluster-devel] [PATCH v2 0/6] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 1/6] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 2/6] gfs2: Clean up gfs2_adjust_reservation Andreas Gruenbacher
2018-12-03 14:32   ` Bob Peterson
2018-12-03 14:58     ` Andreas Gruenbacher
2018-12-04 22:02   ` Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 3/6] gfs2: Add per-reservation reserved block accounting Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 4/6] gfs2: Add local resource group locking Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 5/6] gfs2: Allow node-wide exclusive glock sharing Andreas Gruenbacher
2018-12-01 11:10 ` [Cluster-devel] [PATCH v2 6/6] gfs2: Introduce resource group sharing Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.