linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* GFS2: Pre-pull patch posting (merge window)
@ 2014-01-20 12:23 Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp Steven Whitehouse
                   ` (23 more replies)
  0 siblings, 24 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Here are the pending patches for the merge window which are currently
in the GFS2 tree.

The main topics this time are allocation, in the form of Bob's
improvements when searching resource groups and several updates
to quotas which should increase scalability. The quota changes
follow on from those in the last merge window, and there will
likely be further work to come in this area in due course.

There are also a few patches which help to improve efficiency
of adding entries into directories, and clean up some of that
code.

One on-disk change is included this time, which is to write some
additional information which should be useful to fsck and
also potentially for debugging.

Other than that, its just a few small random bug fixes and
clean ups,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 02/24] GFS2: Drop inadequate rgrps from the reservation tree Steven Whitehouse
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Bob Peterson, Steven Whitehouse

From: Bob Peterson <rpeterso@redhat.com>

Here is a second try at a patch I posted earlier, which also implements
suggestions Steve made:

Before this patch, GFS2 would keep searching through all the rgrps
until it found one that had a chunk of free blocks big enough to
satisfy the size hint, which is based on the file write size,
regardless of whether the chunk was big enough to perform the write.
However, when doing big writes there may not be a large enough
chunk of free blocks in any rgrp, due to file system fragmentation.
The largest chunk may be big enough to satisfy the write request,
but it may not meet the ideal reservation size from the "size hint".
The writes would slow to a crawl because every write would search
every rgrp, then finally give up and default to a single-block write.
In my case, performance would drop from 425MB/s to 18KB/s, or 24000
times slower.

This patch basically makes it so that if we can't find a contiguous
chunk of blocks big enough to satisfy the sizehint, we'll use the
largest chunk of blocks we found that will still contain the write.
It does so by keeping track of the largest run of blocks within the
rgrp.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index c8d6161..809fecd 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -57,6 +57,11 @@
  * 3 = Used (metadata)
  */
 
+struct gfs2_extent {
+	struct gfs2_rbm rbm;
+	u32 len;
+};
+
 static const char valid_change[16] = {
 	        /* current */
 	/* n */ 0, 1, 1, 1,
@@ -65,8 +70,9 @@ static const char valid_change[16] = {
 	        1, 0, 0, 0
 };
 
-static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 minext,
-                         const struct gfs2_inode *ip, bool nowrap);
+static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
+			 const struct gfs2_inode *ip, bool nowrap,
+			 const struct gfs2_alloc_parms *ap);
 
 
 /**
@@ -1455,7 +1461,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 	if (WARN_ON(gfs2_rbm_from_block(&rbm, goal)))
 		return;
 
-	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, extlen, ip, true);
+	ret = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, &extlen, ip, true, ap);
 	if (ret == 0) {
 		rs->rs_rbm = rbm;
 		rs->rs_free = extlen;
@@ -1520,6 +1526,7 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
  * @rbm: The current position in the resource group
  * @ip: The inode for which we are searching for blocks
  * @minext: The minimum extent length
+ * @maxext: A pointer to the maximum extent structure
  *
  * This checks the current position in the rgrp to see whether there is
  * a reservation covering this block. If not then this function is a
@@ -1532,7 +1539,8 @@ static u64 gfs2_next_unreserved_block(struct gfs2_rgrpd *rgd, u64 block,
 
 static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 					     const struct gfs2_inode *ip,
-					     u32 minext)
+					     u32 minext,
+					     struct gfs2_extent *maxext)
 {
 	u64 block = gfs2_rbm_to_block(rbm);
 	u32 extlen = 1;
@@ -1545,8 +1553,7 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 	 */
 	if (minext) {
 		extlen = gfs2_free_extlen(rbm, minext);
-		nblock = block + extlen;
-		if (extlen < minext)
+		if (extlen <= maxext->len)
 			goto fail;
 	}
 
@@ -1555,9 +1562,17 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm,
 	 * and skip if parts of it are already reserved
 	 */
 	nblock = gfs2_next_unreserved_block(rbm->rgd, block, extlen, ip);
-	if (nblock == block)
-		return 0;
+	if (nblock == block) {
+		if (!minext || extlen >= minext)
+			return 0;
+
+		if (extlen > maxext->len) {
+			maxext->len = extlen;
+			maxext->rbm = *rbm;
+		}
 fail:
+		nblock = block + extlen;
+	}
 	ret = gfs2_rbm_from_block(rbm, nblock);
 	if (ret < 0)
 		return ret;
@@ -1568,10 +1583,12 @@ fail:
  * gfs2_rbm_find - Look for blocks of a particular state
  * @rbm: Value/result starting position and final position
  * @state: The state which we want to find
- * @minext: The requested extent length (0 for a single block)
+ * @minext: Pointer to the requested extent length (NULL for a single block)
+ *          This is updated to be the actual reservation size.
  * @ip: If set, check for reservations
  * @nowrap: Stop looking at the end of the rgrp, rather than wrapping
  *          around until we've reached the starting point.
+ * @ap: the allocation parameters
  *
  * Side effects:
  * - If looking for free blocks, we set GBF_FULL on each bitmap which
@@ -1580,8 +1597,9 @@ fail:
  * Returns: 0 on success, -ENOSPC if there is no block of the requested state
  */
 
-static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 minext,
-			 const struct gfs2_inode *ip, bool nowrap)
+static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
+			 const struct gfs2_inode *ip, bool nowrap,
+			 const struct gfs2_alloc_parms *ap)
 {
 	struct buffer_head *bh;
 	int initial_bii;
@@ -1592,6 +1610,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 minext,
 	int iters = rbm->rgd->rd_length;
 	int ret;
 	struct gfs2_bitmap *bi;
+	struct gfs2_extent maxext = { .rbm.rgd = rbm->rgd, };
 
 	/* If we are not starting at the beginning of a bitmap, then we
 	 * need to add one to the bitmap count to ensure that we search
@@ -1620,7 +1639,9 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 minext,
 			return 0;
 
 		initial_bii = rbm->bii;
-		ret = gfs2_reservation_check_and_update(rbm, ip, minext);
+		ret = gfs2_reservation_check_and_update(rbm, ip,
+							minext ? *minext : 0,
+							&maxext);
 		if (ret == 0)
 			return 0;
 		if (ret > 0) {
@@ -1655,6 +1676,17 @@ next_iter:
 			break;
 	}
 
+	if (minext == NULL || state != GFS2_BLKST_FREE)
+		return -ENOSPC;
+
+	/* If the maximum extent we found is big enough to fulfill the
+	   minimum requirements, use it anyway. */
+	if (maxext.len) {
+		*rbm = maxext.rbm;
+		*minext = maxext.len;
+		return 0;
+	}
+
 	return -ENOSPC;
 }
 
@@ -1680,7 +1712,8 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 
 	while (1) {
 		down_write(&sdp->sd_log_flush_lock);
-		error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, 0, NULL, true);
+		error = gfs2_rbm_find(&rbm, GFS2_BLKST_UNLINKED, NULL, NULL,
+				      true, NULL);
 		up_write(&sdp->sd_log_flush_lock);
 		if (error == -ENOSPC)
 			break;
@@ -2184,11 +2217,12 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	int error;
 
 	gfs2_set_alloc_start(&rbm, ip, dinode);
-	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, 0, ip, false);
+	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false, NULL);
 
 	if (error == -ENOSPC) {
 		gfs2_set_alloc_start(&rbm, ip, dinode);
-		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, 0, NULL, false);
+		error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, NULL, false,
+				      NULL);
 	}
 
 	/* Since all blocks are reserved in advance, this shouldn't happen */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/24] GFS2: Drop inadequate rgrps from the reservation tree
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 03/24] GFS2: Implement a "rgrp has no extents longer than X" scheme Steven Whitehouse
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Bob Peterson, Steven Whitehouse

From: Bob Peterson <rpeterso@redhat.com>

This is just basically a resend of a patch I posted earlier.
It didn't change from its original, except in diff offsets, etc:

This patch fixes a bug in the GFS2 block allocation code. The problem
starts if a process already has a multi-block reservation, but for
some reason, another process disqualifies it from further allocations.
For example, the other process might set on the GFS2_RDF_ERROR bit.
The process holding the reservation jumps to label skip_rgrp, but
that label comes after the code that removes the reservation from the
tree. Therefore, the no longer usable reservation is not removed from
the rgrp's reservations tree; it's lost. Eventually, the lost reservation
causes the count of reserved blocks to get off, and eventually that
causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger.
This patch moves the call to after label skip_rgrp so that the
disqualified reservation is properly removed from the tree, thus keeping
the rgrp rd_reserved count sane.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 809fecd..1ccf89a 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1944,15 +1944,16 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const struct gfs2_alloc_parms *a
 			return 0;
 		}
 
-		/* Drop reservation, if we couldn't use reserved rgrp */
-		if (gfs2_rs_active(rs))
-			gfs2_rs_deltree(rs);
 check_rgrp:
 		/* Check for unlinked inodes which can be reclaimed */
 		if (rs->rs_rbm.rgd->rd_flags & GFS2_RDF_CHECK)
 			try_rgrp_unlink(rs->rs_rbm.rgd, &last_unlinked,
 					ip->i_no_addr);
 skip_rgrp:
+		/* Drop reservation, if we couldn't use reserved rgrp */
+		if (gfs2_rs_active(rs))
+			gfs2_rs_deltree(rs);
+
 		/* Unlock rgrp if required */
 		if (!rg_locked)
 			gfs2_glock_dq_uninit(&rs->rs_rgd_gh);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/24] GFS2: Implement a "rgrp has no extents longer than X" scheme
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 02/24] GFS2: Drop inadequate rgrps from the reservation tree Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 04/24] GFS2: Clean up releasepage Steven Whitehouse
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Bob Peterson, Steven Whitehouse

From: Bob Peterson <rpeterso@redhat.com>

With the preceding patch, we started accepting block reservations
smaller than the ideal size, which requires a lot more parsing of the
bitmaps. To reduce the amount of bitmap searching, this patch
implements a scheme whereby each rgrp keeps track of the point
at this multi-block reservations will fail.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index ba1ea67..0132816 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -93,6 +93,7 @@ struct gfs2_rgrpd {
 	struct gfs2_rgrp_lvb *rd_rgl;
 	u32 rd_last_alloc;
 	u32 rd_flags;
+	u32 rd_extfail_pt;		/* extent failure point */
 #define GFS2_RDF_CHECK		0x10000000 /* check for unlinked inodes */
 #define GFS2_RDF_UPTODATE	0x20000000 /* rg is up to date */
 #define GFS2_RDF_ERROR		0x40000000 /* error in rg */
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 010b9fb..cdadb92 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -83,6 +83,7 @@ static void maybe_release_space(struct gfs2_bufdata *bd)
 	       bd->bd_bh->b_data + bi->bi_offset, bi->bi_len);
 	clear_bit(GBF_FULL, &bi->bi_flags);
 	rgd->rd_free_clone = rgd->rd_free;
+	rgd->rd_extfail_pt = rgd->rd_free;
 }
 
 /**
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1ccf89a..797f1d3 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -641,9 +641,13 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 		/* return reserved blocks to the rgrp */
 		BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free);
 		rs->rs_rbm.rgd->rd_reserved -= rs->rs_free;
+		/* The rgrp extent failure point is likely not to increase;
+		   it will only do so if the freed blocks are somehow
+		   contiguous with a span of free blocks that follows. Still,
+		   it will force the number to be recalculated later. */
+		rgd->rd_extfail_pt += rs->rs_free;
 		rs->rs_free = 0;
 		clear_bit(GBF_FULL, &bi->bi_flags);
-		smp_mb__after_clear_bit();
 	}
 }
 
@@ -1132,6 +1136,8 @@ int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
 		gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
 		rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
 		rgd->rd_free_clone = rgd->rd_free;
+		/* max out the rgrp allocation failure point */
+		rgd->rd_extfail_pt = rgd->rd_free;
 	}
 	if (cpu_to_be32(GFS2_MAGIC) != rgd->rd_rgl->rl_magic) {
 		rgd->rd_rgl->rl_unlinked = cpu_to_be32(count_unlinked(rgd));
@@ -1593,6 +1599,8 @@ fail:
  * Side effects:
  * - If looking for free blocks, we set GBF_FULL on each bitmap which
  *   has no free blocks in it.
+ * - If looking for free blocks, we set rd_extfail_pt on each rgrp which
+ *   has come up short on a free block search.
  *
  * Returns: 0 on success, -ENOSPC if there is no block of the requested state
  */
@@ -1604,6 +1612,8 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext,
 	struct buffer_head *bh;
 	int initial_bii;
 	u32 initial_offset;
+	int first_bii = rbm->bii;
+	u32 first_offset = rbm->offset;
 	u32 offset;
 	u8 *buffer;
 	int n = 0;
@@ -1679,6 +1689,13 @@ next_iter:
 	if (minext == NULL || state != GFS2_BLKST_FREE)
 		return -ENOSPC;
 
+	/* If the extent was too small, and it's smaller than the smallest
+	   to have failed before, remember for future reference that it's
+	   useless to search this rgrp again for this amount or more. */
+	if ((first_offset == 0) && (first_bii == 0) &&
+	    (*minext < rbm->rgd->rd_extfail_pt))
+		rbm->rgd->rd_extfail_pt = *minext;
+
 	/* If the maximum extent we found is big enough to fulfill the
 	   minimum requirements, use it anyway. */
 	if (maxext.len) {
@@ -1924,7 +1941,9 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const struct gfs2_alloc_parms *a
 		}
 
 		/* Skip unuseable resource groups */
-		if (rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC | GFS2_RDF_ERROR))
+		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
+						 GFS2_RDF_ERROR)) ||
+		    (ap && (ap->target > rs->rs_rbm.rgd->rd_extfail_pt)))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
@@ -2106,10 +2125,10 @@ int gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 
 	if (rgd == NULL)
 		return 0;
-	gfs2_print_dbg(seq, " R: n:%llu f:%02x b:%u/%u i:%u r:%u\n",
+	gfs2_print_dbg(seq, " R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
 		       (unsigned long long)rgd->rd_addr, rgd->rd_flags,
 		       rgd->rd_free, rgd->rd_free_clone, rgd->rd_dinodes,
-		       rgd->rd_reserved);
+		       rgd->rd_reserved, rgd->rd_extfail_pt);
 	spin_lock(&rgd->rd_rsspin);
 	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
 		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
@@ -2228,9 +2247,10 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 
 	/* Since all blocks are reserved in advance, this shouldn't happen */
 	if (error) {
-		fs_warn(sdp, "inum=%llu error=%d, nblocks=%u, full=%d\n",
+		fs_warn(sdp, "inum=%llu error=%d, nblocks=%u, full=%d fail_pt=%d\n",
 			(unsigned long long)ip->i_no_addr, error, *nblocks,
-			test_bit(GBF_FULL, &rbm.rgd->rd_bits->bi_flags));
+			test_bit(GBF_FULL, &rbm.rgd->rd_bits->bi_flags),
+			rbm.rgd->rd_extfail_pt);
 		goto rgrp_error;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/24] GFS2: Clean up releasepage
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (2 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 03/24] GFS2: Implement a "rgrp has no extents longer than X" scheme Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 05/24] GFS2: Remove gfs2_quota_change_host structure Steven Whitehouse
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

For historical reasons, we drop and retake the log lock in ->releasepage()
however, since there is no reason why we cannot hold the log lock over
the whole function, this allows some simplification. In particular,
pinning a buffer is only ever done under the log lock, so it is possible
here to remove the test for pinned buffers in the second loop, since it
is impossible for that to happen (it is also tested in the first loop).

As a result, two tests made later in the second loop become constants
and can also be reduced to the only possible branch. So the net result
is to remove various bits of unreachable code and make this more
readable.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 73f3e4e..cf858da 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1080,30 +1080,22 @@ int gfs2_releasepage(struct page *page, gfp_t gfp_mask)
 		bh = bh->b_this_page;
 	} while(bh != head);
 	spin_unlock(&sdp->sd_ail_lock);
-	gfs2_log_unlock(sdp);
 
 	head = bh = page_buffers(page);
 	do {
-		gfs2_log_lock(sdp);
 		bd = bh->b_private;
 		if (bd) {
 			gfs2_assert_warn(sdp, bd->bd_bh == bh);
-			if (!list_empty(&bd->bd_list)) {
-				if (!buffer_pinned(bh))
-					list_del_init(&bd->bd_list);
-				else
-					bd = NULL;
-			}
-			if (bd)
-				bd->bd_bh = NULL;
+			if (!list_empty(&bd->bd_list))
+				list_del_init(&bd->bd_list);
+			bd->bd_bh = NULL;
 			bh->b_private = NULL;
-		}
-		gfs2_log_unlock(sdp);
-		if (bd)
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
+		}
 
 		bh = bh->b_this_page;
 	} while (bh != head);
+	gfs2_log_unlock(sdp);
 
 	return try_to_free_buffers(page);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/24] GFS2: Remove gfs2_quota_change_host structure
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (3 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 04/24] GFS2: Clean up releasepage Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 06/24] GFS2: Remove test which is always true Steven Whitehouse
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

There is only one place this is used, when reading in the quota
changes at mount time. It is not really required and much
simpler to just convert the fields from the on-disk structure
as required.

There should be no functional change as a result of this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 98236d0..1b6b367 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -67,12 +67,6 @@
 #include "inode.h"
 #include "util.h"
 
-struct gfs2_quota_change_host {
-	u64 qc_change;
-	u32 qc_flags; /* GFS2_QCF_... */
-	struct kqid qc_id;
-};
-
 /* Lock order: qd_lock -> qd->lockref.lock -> lru lock */
 static DEFINE_SPINLOCK(qd_lock);
 struct list_lru gfs2_qd_lru;
@@ -1214,17 +1208,6 @@ int gfs2_quota_refresh(struct gfs2_sbd *sdp, struct kqid qid)
 	return error;
 }
 
-static void gfs2_quota_change_in(struct gfs2_quota_change_host *qc, const void *buf)
-{
-	const struct gfs2_quota_change *str = buf;
-
-	qc->qc_change = be64_to_cpu(str->qc_change);
-	qc->qc_flags = be32_to_cpu(str->qc_flags);
-	qc->qc_id = make_kqid(&init_user_ns,
-			      (qc->qc_flags & GFS2_QCF_USER)?USRQUOTA:GRPQUOTA,
-			      be32_to_cpu(str->qc_id));
-}
-
 int gfs2_quota_init(struct gfs2_sbd *sdp)
 {
 	struct gfs2_inode *ip = GFS2_I(sdp->sd_qc_inode);
@@ -1257,6 +1240,7 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 
 	for (x = 0; x < blocks; x++) {
 		struct buffer_head *bh;
+		const struct gfs2_quota_change *qc;
 		unsigned int y;
 
 		if (!extlen) {
@@ -1274,25 +1258,28 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 			goto fail;
 		}
 
+		qc = (const struct gfs2_quota_change *)(bh->b_data + sizeof(struct gfs2_meta_header));
 		for (y = 0; y < sdp->sd_qc_per_block && slot < sdp->sd_quota_slots;
 		     y++, slot++) {
-			struct gfs2_quota_change_host qc;
 			struct gfs2_quota_data *qd;
-
-			gfs2_quota_change_in(&qc, bh->b_data +
-					  sizeof(struct gfs2_meta_header) +
-					  y * sizeof(struct gfs2_quota_change));
-			if (!qc.qc_change)
+			s64 qc_change = be64_to_cpu(qc->qc_change);
+			u32 qc_flags = be32_to_cpu(qc->qc_flags);
+			enum quota_type qtype = (qc_flags & GFS2_QCF_USER) ?
+						USRQUOTA : GRPQUOTA;
+			struct kqid qc_id = make_kqid(&init_user_ns, qtype,
+						      be32_to_cpu(qc->qc_id));
+			qc++;
+			if (!qc_change)
 				continue;
 
-			error = qd_alloc(sdp, qc.qc_id, &qd);
+			error = qd_alloc(sdp, qc_id, &qd);
 			if (error) {
 				brelse(bh);
 				goto fail;
 			}
 
 			set_bit(QDF_CHANGE, &qd->qd_flags);
-			qd->qd_change = qc.qc_change;
+			qd->qd_change = qc_change;
 			qd->qd_slot = slot;
 			qd->qd_slot_count = 1;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/24] GFS2: Remove test which is always true
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (4 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 05/24] GFS2: Remove gfs2_quota_change_host structure Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 07/24] GFS2: Use range based functions for rgrp sync/invalidation Steven Whitehouse
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

Since gfs2_inplace_reserve() is always called with a valid
alloc parms structure, there is no need to test for this
within the function itself - and in any case, after we've
all ready dereferenced it anyway.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 797f1d3..2584710 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1943,7 +1943,7 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const struct gfs2_alloc_parms *a
 		/* Skip unuseable resource groups */
 		if ((rs->rs_rbm.rgd->rd_flags & (GFS2_RGF_NOALLOC |
 						 GFS2_RDF_ERROR)) ||
-		    (ap && (ap->target > rs->rs_rbm.rgd->rd_extfail_pt)))
+		    (ap->target > rs->rs_rbm.rgd->rd_extfail_pt))
 			goto skip_rgrp;
 
 		if (sdp->sd_args.ar_rgrplvb)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/24] GFS2: Use range based functions for rgrp sync/invalidation
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (5 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 06/24] GFS2: Remove test which is always true Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 08/24] GFS2: Use only a single address space for rgrps Steven Whitehouse
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

Each rgrp header is represented as a single extent on disk, so we
can calculate the position within the address space, since we are
using address spaces mapped 1:1 to the disk. This means that it
is possible to use the range based versions of filemap_fdatawrite/wait
and for invalidating the page cache.

Our eventual intent is to then be able to merge the address spaces
used for rgrps into a single address space, rather than to have
one for each glock, saving memory and reducing complexity.

Since during umount, the rgrp structures are disposed of before
the glocks, we need to store the extent information in the glock
so that is is available for a final invalidation. This patch uses
a field which is otherwise unused in rgrp glocks to do that, so
that we do not have to expand the size of a glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index f88dcd9..1b192c8d 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -142,8 +142,8 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
 	GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
 
 	gfs2_log_flush(gl->gl_sbd, gl);
-	filemap_fdatawrite(metamapping);
-	error = filemap_fdatawait(metamapping);
+	filemap_fdatawrite_range(metamapping, gl->gl_vm.start, gl->gl_vm.end);
+	error = filemap_fdatawait_range(metamapping, gl->gl_vm.start, gl->gl_vm.end);
         mapping_set_error(metamapping, error);
 	gfs2_ail_empty_gl(gl);
 
@@ -170,7 +170,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 
 	WARN_ON_ONCE(!(flags & DIO_METADATA));
 	gfs2_assert_withdraw(gl->gl_sbd, !atomic_read(&gl->gl_ail_count));
-	truncate_inode_pages(mapping, 0);
+	truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
 
 	if (gl->gl_object) {
 		struct gfs2_rgrpd *rgd = (struct gfs2_rgrpd *)gl->gl_object;
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0132816..e6544ee 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -351,7 +351,15 @@ struct gfs2_glock {
 	atomic_t gl_ail_count;
 	atomic_t gl_revokes;
 	struct delayed_work gl_work;
-	struct work_struct gl_delete;
+	union {
+		/* For inode and iopen glocks only */
+		struct work_struct gl_delete;
+		/* For rgrp glocks only */
+		struct {
+			loff_t start;
+			loff_t end;
+		} gl_vm;
+	};
 	struct rcu_head gl_rcu;
 };
 
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 2584710..183cf0f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -886,6 +886,7 @@ static int rgd_insert(struct gfs2_rgrpd *rgd)
 static int read_rindex_entry(struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+	const unsigned bsize = sdp->sd_sb.sb_bsize;
 	loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex);
 	struct gfs2_rindex buf;
 	int error;
@@ -923,6 +924,8 @@ static int read_rindex_entry(struct gfs2_inode *ip)
 		goto fail;
 
 	rgd->rd_gl->gl_object = rgd;
+	rgd->rd_gl->gl_vm.start = rgd->rd_addr * bsize;
+	rgd->rd_gl->gl_vm.end = rgd->rd_gl->gl_vm.start + (rgd->rd_length * bsize) - 1;
 	rgd->rd_rgl = (struct gfs2_rgrp_lvb *)rgd->rd_gl->gl_lksb.sb_lvbptr;
 	rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
 	if (rgd->rd_data > sdp->sd_max_rg_data)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/24] GFS2: Use only a single address space for rgrps
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (6 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 07/24] GFS2: Use range based functions for rgrp sync/invalidation Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 09/24] GFS2: Add directory addition info structure Steven Whitehouse
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

Prior to this patch, GFS2 had one address space for each rgrp,
stored in the glock. This patch changes them to use a single
address space in the super block. This therefore saves
(sizeof(struct address_space) * nr_of_rgrps) bytes of memory
and for large filesystems, that can be significant.

It would be nice to be able to do something similar and merge
the inode metadata address space into the same global
address space. However, that is rather more complicated as the
on-disk location doesn't have a 1:1 mapping with the inodes in
general. So while it could be done, it will be a more complicated
operation as it requires changing a lot more code paths.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1b192c8d..b792dbc 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -133,7 +133,8 @@ void gfs2_ail_flush(struct gfs2_glock *gl, bool fsync)
 
 static void rgrp_go_sync(struct gfs2_glock *gl)
 {
-	struct address_space *metamapping = gfs2_glock2aspace(gl);
+	struct gfs2_sbd *sdp = gl->gl_sbd;
+	struct address_space *mapping = &sdp->sd_aspace;
 	struct gfs2_rgrpd *rgd;
 	int error;
 
@@ -141,10 +142,10 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
 		return;
 	GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
 
-	gfs2_log_flush(gl->gl_sbd, gl);
-	filemap_fdatawrite_range(metamapping, gl->gl_vm.start, gl->gl_vm.end);
-	error = filemap_fdatawait_range(metamapping, gl->gl_vm.start, gl->gl_vm.end);
-        mapping_set_error(metamapping, error);
+	gfs2_log_flush(sdp, gl);
+	filemap_fdatawrite_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
+	error = filemap_fdatawait_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
+	mapping_set_error(mapping, error);
 	gfs2_ail_empty_gl(gl);
 
 	spin_lock(&gl->gl_spin);
@@ -166,10 +167,11 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
 
 static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 {
-	struct address_space *mapping = gfs2_glock2aspace(gl);
+	struct gfs2_sbd *sdp = gl->gl_sbd;
+	struct address_space *mapping = &sdp->sd_aspace;
 
 	WARN_ON_ONCE(!(flags & DIO_METADATA));
-	gfs2_assert_withdraw(gl->gl_sbd, !atomic_read(&gl->gl_ail_count));
+	gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
 	truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
 
 	if (gl->gl_object) {
@@ -558,7 +560,7 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = {
 	.go_unlock = gfs2_rgrp_go_unlock,
 	.go_dump = gfs2_rgrp_dump,
 	.go_type = LM_TYPE_RGRP,
-	.go_flags = GLOF_ASPACE | GLOF_LVB,
+	.go_flags = GLOF_LVB,
 };
 
 const struct gfs2_glock_operations gfs2_trans_glops = {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index e6544ee..a99f60c 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -736,6 +736,8 @@ struct gfs2_sbd {
 
 	/* Log stuff */
 
+	struct address_space sd_aspace;
+
 	spinlock_t sd_log_lock;
 
 	struct gfs2_trans *sd_log_tr;
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index cdadb92..58f0640 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -589,8 +589,12 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, unsigned int start,
 static void gfs2_meta_sync(struct gfs2_glock *gl)
 {
 	struct address_space *mapping = gfs2_glock2aspace(gl);
+	struct gfs2_sbd *sdp = gl->gl_sbd;
 	int error;
 
+	if (mapping == NULL)
+		mapping = &sdp->sd_aspace;
+
 	filemap_fdatawrite(mapping);
 	error = filemap_fdatawait(mapping);
 
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 52f177b..c7f2469 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -116,6 +116,9 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
 	unsigned long index;
 	unsigned int bufnum;
 
+	if (mapping == NULL)
+		mapping = &sdp->sd_aspace;
+
 	shift = PAGE_CACHE_SHIFT - sdp->sd_sb.sb_bsize_shift;
 	index = blkno >> shift;             /* convert block to page */
 	bufnum = blkno - (index << shift);  /* block buf index within page */
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 52fa883..94aab31 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -36,6 +36,7 @@
 #include "log.h"
 #include "quota.h"
 #include "dir.h"
+#include "meta_io.h"
 #include "trace_gfs2.h"
 
 #define DO 0
@@ -62,6 +63,7 @@ static void gfs2_tune_init(struct gfs2_tune *gt)
 static struct gfs2_sbd *init_sbd(struct super_block *sb)
 {
 	struct gfs2_sbd *sdp;
+	struct address_space *mapping;
 
 	sdp = kzalloc(sizeof(struct gfs2_sbd), GFP_KERNEL);
 	if (!sdp)
@@ -98,6 +100,16 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 	INIT_LIST_HEAD(&sdp->sd_trunc_list);
 	spin_lock_init(&sdp->sd_trunc_lock);
 
+	mapping = &sdp->sd_aspace;
+
+	mapping->a_ops = &gfs2_meta_aops;
+	mapping->host = sb->s_bdev->bd_inode;
+	mapping->flags = 0;
+	mapping_set_gfp_mask(mapping, GFP_NOFS);
+	mapping->private_data = NULL;
+	mapping->backing_dev_info = sb->s_bdi;
+	mapping->writeback_index = 0;
+
 	spin_lock_init(&sdp->sd_log_lock);
 	atomic_set(&sdp->sd_log_pinned, 0);
 	INIT_LIST_HEAD(&sdp->sd_log_le_buf);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/24] GFS2: Add directory addition info structure
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (7 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 08/24] GFS2: Use only a single address space for rgrps Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 10/24] GFS2: Consolidate transaction blocks calculation for dir add Steven Whitehouse
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

The intent is that this structure will hold the information
required when adding entries to a directory (linking). To
start with, it will contain only the number of blocks which
are required to link the new entry into the directory. The
current calculation returns either 0 or the maximim number of
blocks that can ever be requested by such a transaction.

The intent is that in a later patch, we can update the dir
code to calculate this value more accurately. In addition
further patches will also add further fields to the new
structure to increase its utility.

In addition this patch fixes a bug where the link used during
inode creation was adding requesting too many blocks in
some cases. This is harmless unless the fs is close to being
full.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 2e5fc26..0b6be20 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2017,18 +2017,24 @@ out:
  * gfs2_diradd_alloc_required - find if adding entry will require an allocation
  * @ip: the file being written to
  * @filname: the filename that's going to be added
+ * @da: The structure to return dir alloc info
  *
- * Returns: 1 if alloc required, 0 if not, -ve on error
+ * Returns: 0 if ok, -ve on error
  */
 
-int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name)
+int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name,
+			       struct gfs2_diradd *da)
 {
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_dirent *dent;
 	struct buffer_head *bh;
 
+	da->nr_blocks = 0;
+
 	dent = gfs2_dirent_search(inode, name, gfs2_dirent_find_space, &bh);
 	if (!dent) {
-		return 1;
+		da->nr_blocks = sdp->sd_max_dirres;
+		return 0;
 	}
 	if (IS_ERR(dent))
 		return PTR_ERR(dent);
diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h
index 4f03bbd..c5573e7 100644
--- a/fs/gfs2/dir.h
+++ b/fs/gfs2/dir.h
@@ -17,6 +17,10 @@ struct inode;
 struct gfs2_inode;
 struct gfs2_inum;
 
+struct gfs2_diradd {
+	unsigned nr_blocks;
+};
+
 extern struct inode *gfs2_dir_search(struct inode *dir,
 				     const struct qstr *filename,
 				     bool fail_on_exist);
@@ -33,7 +37,8 @@ extern int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
 extern int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip);
 
 extern int gfs2_diradd_alloc_required(struct inode *dir,
-				      const struct qstr *filename);
+				      const struct qstr *filename,
+				      struct gfs2_diradd *da);
 extern int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
 				   struct buffer_head **bhp);
 extern void gfs2_dir_hash_inval(struct gfs2_inode *ip);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 7119504..9ac8f13 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -470,13 +470,13 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_inode *ip,
 }
 
 static int link_dinode(struct gfs2_inode *dip, const struct qstr *name,
-		       struct gfs2_inode *ip, int arq)
+		       struct gfs2_inode *ip, struct gfs2_diradd *da)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode);
-	struct gfs2_alloc_parms ap = { .target = sdp->sd_max_dirres, };
+	struct gfs2_alloc_parms ap = { .target = da->nr_blocks, };
 	int error;
 
-	if (arq) {
+	if (da->nr_blocks) {
 		error = gfs2_quota_lock_check(dip);
 		if (error)
 			goto fail_quota_locks;
@@ -485,8 +485,8 @@ static int link_dinode(struct gfs2_inode *dip, const struct qstr *name,
 		if (error)
 			goto fail_quota_locks;
 
-		error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
-					 dip->i_rgd->rd_length +
+		error = gfs2_trans_begin(sdp, da->nr_blocks +
+					 gfs2_rg_blocks(dip, da->nr_blocks) +
 					 2 * RES_DINODE +
 					 RES_STATFS + RES_QUOTA, 0);
 		if (error)
@@ -560,7 +560,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	struct dentry *d;
 	int error;
 	u32 aflags = 0;
-	int arq;
+	struct gfs2_diradd da;
 
 	if (!name->len || name->len > GFS2_FNAMESIZE)
 		return -ENAMETOOLONG;
@@ -602,7 +602,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 		goto fail_gunlock;
 	}
 
-	arq = error = gfs2_diradd_alloc_required(dir, name);
+	error = gfs2_diradd_alloc_required(dir, name, &da);
 	if (error < 0)
 		goto fail_gunlock;
 
@@ -690,7 +690,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_gunlock3;
 
-	error = link_dinode(dip, name, ip, arq);
+	error = link_dinode(dip, name, ip, &da);
 	if (error)
 		goto fail_gunlock3;
 
@@ -817,7 +817,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder ghs[2];
 	struct buffer_head *dibh;
-	int alloc_required;
+	struct gfs2_diradd da;
 	int error;
 
 	if (S_ISDIR(inode->i_mode))
@@ -872,13 +872,12 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 	if (ip->i_inode.i_nlink == (u32)-1)
 		goto out_gunlock;
 
-	alloc_required = error = gfs2_diradd_alloc_required(dir, &dentry->d_name);
+	error = gfs2_diradd_alloc_required(dir, &dentry->d_name, &da);
 	if (error < 0)
 		goto out_gunlock;
-	error = 0;
 
-	if (alloc_required) {
-		struct gfs2_alloc_parms ap = { .target = sdp->sd_max_dirres, };
+	if (da.nr_blocks) {
+		struct gfs2_alloc_parms ap = { .target = da.nr_blocks, };
 		error = gfs2_quota_lock_check(dip);
 		if (error)
 			goto out_gunlock;
@@ -887,8 +886,8 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 		if (error)
 			goto out_gunlock_q;
 
-		error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
-					 gfs2_rg_blocks(dip, sdp->sd_max_dirres) +
+		error = gfs2_trans_begin(sdp, da.nr_blocks +
+					 gfs2_rg_blocks(dip, da.nr_blocks) +
 					 2 * RES_DINODE + RES_STATFS +
 					 RES_QUOTA, 0);
 		if (error)
@@ -919,10 +918,10 @@ out_brelse:
 out_end_trans:
 	gfs2_trans_end(sdp);
 out_ipres:
-	if (alloc_required)
+	if (da.nr_blocks)
 		gfs2_inplace_release(dip);
 out_gunlock_q:
-	if (alloc_required)
+	if (da.nr_blocks)
 		gfs2_quota_unlock(dip);
 out_gunlock:
 	gfs2_glock_dq(ghs + 1);
@@ -1254,7 +1253,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	struct gfs2_rgrpd *nrgd;
 	unsigned int num_gh;
 	int dir_rename = 0;
-	int alloc_required = 0;
+	struct gfs2_diradd da = { .nr_blocks = 0, };
 	unsigned int x;
 	int error;
 
@@ -1388,14 +1387,14 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 			goto out_gunlock;
 	}
 
-	if (nip == NULL)
-		alloc_required = gfs2_diradd_alloc_required(ndir, &ndentry->d_name);
-	error = alloc_required;
-	if (error < 0)
-		goto out_gunlock;
+	if (nip == NULL) {
+		error = gfs2_diradd_alloc_required(ndir, &ndentry->d_name, &da);
+		if (error)
+			goto out_gunlock;
+	}
 
-	if (alloc_required) {
-		struct gfs2_alloc_parms ap = { .target = sdp->sd_max_dirres, };
+	if (da.nr_blocks) {
+		struct gfs2_alloc_parms ap = { .target = da.nr_blocks, };
 		error = gfs2_quota_lock_check(ndip);
 		if (error)
 			goto out_gunlock;
@@ -1404,8 +1403,8 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		if (error)
 			goto out_gunlock_q;
 
-		error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
-					 gfs2_rg_blocks(ndip, sdp->sd_max_dirres) +
+		error = gfs2_trans_begin(sdp, da.nr_blocks +
+					 gfs2_rg_blocks(ndip, da.nr_blocks) +
 					 4 * RES_DINODE + 4 * RES_LEAF +
 					 RES_STATFS + RES_QUOTA + 4, 0);
 		if (error)
@@ -1448,10 +1447,10 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 out_end_trans:
 	gfs2_trans_end(sdp);
 out_ipreserv:
-	if (alloc_required)
+	if (da.nr_blocks)
 		gfs2_inplace_release(ndip);
 out_gunlock_q:
-	if (alloc_required)
+	if (da.nr_blocks)
 		gfs2_quota_unlock(ndip);
 out_gunlock:
 	while (x--) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/24] GFS2: Consolidate transaction blocks calculation for dir add
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (8 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 09/24] GFS2: Add directory addition info structure Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 11/24] GFS2: Remember directory insert point Steven Whitehouse
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

There are three cases where we need to calculate the number of
blocks to reserve in a transaction involving linking an inode
into a directory. The one in rename is a bit more complicated,
but the basis of it is the same as for link and create. So it
makes sense to move this calculation into a single function
rather than repeating it three times.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 9ac8f13..fa4624f 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -469,6 +469,28 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_inode *ip,
 	brelse(dibh);
 }
 
+/**
+ * gfs2_trans_da_blocks - Calculate number of blocks to link inode
+ * @dip: The directory we are linking into
+ * @da: The dir add information
+ * @nr_inodes: The number of inodes involved
+ *
+ * This calculate the number of blocks we need to reserve in a
+ * transaction to link @nr_inodes into a directory. In most cases
+ * @nr_inodes will be 2 (the directory plus the inode being linked in)
+ * but in case of rename, 4 may be required.
+ *
+ * Returns: Number of blocks
+ */
+
+static unsigned gfs2_trans_da_blks(const struct gfs2_inode *dip,
+				   const struct gfs2_diradd *da,
+				   unsigned nr_inodes)
+{
+	return da->nr_blocks + gfs2_rg_blocks(dip, da->nr_blocks) +
+	       (nr_inodes * RES_DINODE) + RES_QUOTA + RES_STATFS;
+}
+
 static int link_dinode(struct gfs2_inode *dip, const struct qstr *name,
 		       struct gfs2_inode *ip, struct gfs2_diradd *da)
 {
@@ -485,10 +507,7 @@ static int link_dinode(struct gfs2_inode *dip, const struct qstr *name,
 		if (error)
 			goto fail_quota_locks;
 
-		error = gfs2_trans_begin(sdp, da->nr_blocks +
-					 gfs2_rg_blocks(dip, da->nr_blocks) +
-					 2 * RES_DINODE +
-					 RES_STATFS + RES_QUOTA, 0);
+		error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(dip, da, 2), 0);
 		if (error)
 			goto fail_ipreserv;
 	} else {
@@ -886,10 +905,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 		if (error)
 			goto out_gunlock_q;
 
-		error = gfs2_trans_begin(sdp, da.nr_blocks +
-					 gfs2_rg_blocks(dip, da.nr_blocks) +
-					 2 * RES_DINODE + RES_STATFS +
-					 RES_QUOTA, 0);
+		error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(dip, &da, 2), 0);
 		if (error)
 			goto out_ipres;
 	} else {
@@ -1403,10 +1419,8 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		if (error)
 			goto out_gunlock_q;
 
-		error = gfs2_trans_begin(sdp, da.nr_blocks +
-					 gfs2_rg_blocks(ndip, da.nr_blocks) +
-					 4 * RES_DINODE + 4 * RES_LEAF +
-					 RES_STATFS + RES_QUOTA + 4, 0);
+		error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(ndip, &da, 4) +
+					 4 * RES_LEAF + 4, 0);
 		if (error)
 			goto out_ipreserv;
 	} else {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/24] GFS2: Remember directory insert point
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (9 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 10/24] GFS2: Consolidate transaction blocks calculation for dir add Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 12/24] GFS2: Increase i_writecount during gfs2_setattr_chown Steven Whitehouse
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

When we look to see if there is enough space to add a dir
entry without allocation, we have then been repeating the
same search later when we do the actual insertion. This
patch caches the details of the location in the gfs2_diradd
structure, so that we do not have to repeat the search.

This will provide a performance improvement which will be
greater as the size of the directory increases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 0b6be20..d5988aa 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1659,26 +1659,34 @@ static int dir_new_leaf(struct inode *inode, const struct qstr *name)
 
 /**
  * gfs2_dir_add - Add new filename into directory
- * @dip: The GFS2 inode
- * @filename: The new name
- * @inode: The inode number of the entry
- * @type: The type of the entry
+ * @inode: The directory inode
+ * @name: The new name
+ * @nip: The GFS2 inode to be linked in to the directory
+ * @da: The directory addition info
+ *
+ * If the call to gfs2_diradd_alloc_required resulted in there being
+ * no need to allocate any new directory blocks, then it will contain
+ * a pointer to the directory entry and the bh in which it resides. We
+ * can use that without having to repeat the search. If there was no
+ * free space, then we must now create more space.
  *
  * Returns: 0 on success, error code on failure
  */
 
 int gfs2_dir_add(struct inode *inode, const struct qstr *name,
-		 const struct gfs2_inode *nip)
+		 const struct gfs2_inode *nip, struct gfs2_diradd *da)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
-	struct buffer_head *bh;
-	struct gfs2_dirent *dent;
+	struct buffer_head *bh = da->bh;
+	struct gfs2_dirent *dent = da->dent;
 	struct gfs2_leaf *leaf;
 	int error;
 
 	while(1) {
-		dent = gfs2_dirent_search(inode, name, gfs2_dirent_find_space,
-					  &bh);
+		if (da->bh == NULL) {
+			dent = gfs2_dirent_search(inode, name,
+						  gfs2_dirent_find_space, &bh);
+		}
 		if (dent) {
 			if (IS_ERR(dent))
 				return PTR_ERR(dent);
@@ -1689,6 +1697,8 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name,
 				leaf = (struct gfs2_leaf *)bh->b_data;
 				be16_add_cpu(&leaf->lf_entries, 1);
 			}
+			da->dent = NULL;
+			da->bh = NULL;
 			brelse(bh);
 			ip->i_entries++;
 			ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
@@ -2030,6 +2040,8 @@ int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name,
 	struct buffer_head *bh;
 
 	da->nr_blocks = 0;
+	da->bh = NULL;
+	da->dent = NULL;
 
 	dent = gfs2_dirent_search(inode, name, gfs2_dirent_find_space, &bh);
 	if (!dent) {
@@ -2038,7 +2050,8 @@ int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name,
 	}
 	if (IS_ERR(dent))
 		return PTR_ERR(dent);
-	brelse(bh);
+	da->bh = bh;
+	da->dent = dent;
 	return 0;
 }
 
diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h
index c5573e7..126c65d 100644
--- a/fs/gfs2/dir.h
+++ b/fs/gfs2/dir.h
@@ -16,9 +16,13 @@
 struct inode;
 struct gfs2_inode;
 struct gfs2_inum;
+struct buffer_head;
+struct gfs2_dirent;
 
 struct gfs2_diradd {
 	unsigned nr_blocks;
+	struct gfs2_dirent *dent;
+	struct buffer_head *bh;
 };
 
 extern struct inode *gfs2_dir_search(struct inode *dir,
@@ -27,7 +31,13 @@ extern struct inode *gfs2_dir_search(struct inode *dir,
 extern int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
 			  const struct gfs2_inode *ip);
 extern int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
-			const struct gfs2_inode *ip);
+			const struct gfs2_inode *ip, struct gfs2_diradd *da);
+static inline void gfs2_dir_no_add(struct gfs2_diradd *da)
+{
+	if (da->bh)
+		brelse(da->bh);
+	da->bh = NULL;
+}
 extern int gfs2_dir_del(struct gfs2_inode *dip, const struct dentry *dentry);
 extern int gfs2_dir_read(struct inode *inode, struct dir_context *ctx,
 			 struct file_ra_state *f_ra);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index fa4624f..4acc584 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -516,7 +516,7 @@ static int link_dinode(struct gfs2_inode *dip, const struct qstr *name,
 			goto fail_quota_locks;
 	}
 
-	error = gfs2_dir_add(&dip->i_inode, name, ip);
+	error = gfs2_dir_add(&dip->i_inode, name, ip, da);
 	if (error)
 		goto fail_end_trans;
 
@@ -579,7 +579,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	struct dentry *d;
 	int error;
 	u32 aflags = 0;
-	struct gfs2_diradd da;
+	struct gfs2_diradd da = { .bh = NULL, };
 
 	if (!name->len || name->len > GFS2_FNAMESIZE)
 		return -ENAMETOOLONG;
@@ -738,6 +738,7 @@ fail_free_inode:
 	free_inode_nonrcu(inode);
 	inode = NULL;
 fail_gunlock:
+	gfs2_dir_no_add(&da);
 	gfs2_glock_dq_uninit(ghs);
 	if (inode && !IS_ERR(inode)) {
 		clear_nlink(inode);
@@ -836,7 +837,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder ghs[2];
 	struct buffer_head *dibh;
-	struct gfs2_diradd da;
+	struct gfs2_diradd da = { .bh = NULL, };
 	int error;
 
 	if (S_ISDIR(inode->i_mode))
@@ -918,7 +919,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 	if (error)
 		goto out_end_trans;
 
-	error = gfs2_dir_add(dir, &dentry->d_name, ip);
+	error = gfs2_dir_add(dir, &dentry->d_name, ip, &da);
 	if (error)
 		goto out_brelse;
 
@@ -940,6 +941,7 @@ out_gunlock_q:
 	if (da.nr_blocks)
 		gfs2_quota_unlock(dip);
 out_gunlock:
+	gfs2_dir_no_add(&da);
 	gfs2_glock_dq(ghs + 1);
 out_child:
 	gfs2_glock_dq(ghs);
@@ -1454,7 +1456,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	if (error)
 		goto out_end_trans;
 
-	error = gfs2_dir_add(ndir, &ndentry->d_name, ip);
+	error = gfs2_dir_add(ndir, &ndentry->d_name, ip, &da);
 	if (error)
 		goto out_end_trans;
 
@@ -1467,6 +1469,7 @@ out_gunlock_q:
 	if (da.nr_blocks)
 		gfs2_quota_unlock(ndip);
 out_gunlock:
+	gfs2_dir_no_add(&da);
 	while (x--) {
 		gfs2_glock_dq(ghs + x);
 		gfs2_holder_uninit(ghs + x);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/24] GFS2: Increase i_writecount during gfs2_setattr_chown
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (10 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 11/24] GFS2: Remember directory insert point Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 13/24] GFS2: For exhash conversion, only one block is needed Steven Whitehouse
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Bob Peterson, Steven Whitehouse

From: Bob Peterson <rpeterso@redhat.com>

This patch calls get_write_access in function gfs2_setattr_chown,
which merely increases inode->i_writecount for the duration of the
function. That will ensure that any file closes won't delete the
inode's multi-block reservation while the function is running.
It also ensures that a multi-block reservation exists when needed
for quota change operations during the chown.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 4acc584..51cf10d 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1623,10 +1623,22 @@ static int setattr_chown(struct inode *inode, struct iattr *attr)
 	if (!(attr->ia_valid & ATTR_GID) || gid_eq(ogid, ngid))
 		ogid = ngid = NO_GID_QUOTA_CHANGE;
 
-	error = gfs2_quota_lock(ip, nuid, ngid);
+	error = get_write_access(inode);
 	if (error)
 		return error;
 
+	error = gfs2_rs_alloc(ip);
+	if (error)
+		goto out;
+
+	error = gfs2_rindex_update(sdp);
+	if (error)
+		goto out;
+
+	error = gfs2_quota_lock(ip, nuid, ngid);
+	if (error)
+		goto out;
+
 	if (!uid_eq(ouid, NO_UID_QUOTA_CHANGE) ||
 	    !gid_eq(ogid, NO_GID_QUOTA_CHANGE)) {
 		error = gfs2_quota_check(ip, nuid, ngid);
@@ -1653,6 +1665,8 @@ out_end_trans:
 	gfs2_trans_end(sdp);
 out_gunlock_q:
 	gfs2_quota_unlock(ip);
+out:
+	put_write_access(inode);
 	return error;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/24] GFS2: For exhash conversion, only one block is needed
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (11 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 12/24] GFS2: Increase i_writecount during gfs2_setattr_chown Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 14/24] GFS2: Add hints to directory leaf blocks Steven Whitehouse
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

For most cases, only a single new block is needed when we reach
the point of converting from stuffed to exhash directory. The
exception being when the file name is so long that it will not
fit within the new leaf block.

So this patch adds a simple test for that situation so that we
do not need to request the full reservation size in this case.

Potentially we could calculate more accurately the value to use
in other cases too, but that is much more complicated to do and
it is doubtful that the benefit would outweigh the extra cost
in code complexity.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index d5988aa..b2e5ebf 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2035,7 +2035,9 @@ out:
 int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name,
 			       struct gfs2_diradd *da)
 {
+	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
+	const unsigned int extra = sizeof(struct gfs2_dinode) - sizeof(struct gfs2_leaf);
 	struct gfs2_dirent *dent;
 	struct buffer_head *bh;
 
@@ -2046,6 +2048,9 @@ int gfs2_diradd_alloc_required(struct inode *inode, const struct qstr *name,
 	dent = gfs2_dirent_search(inode, name, gfs2_dirent_find_space, &bh);
 	if (!dent) {
 		da->nr_blocks = sdp->sd_max_dirres;
+		if (!(ip->i_diskflags & GFS2_DIF_EXHASH) &&
+		    (GFS2_DIRENT_SIZE(name->len) < extra))
+			da->nr_blocks = 1;
 		return 0;
 	}
 	if (IS_ERR(dent))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/24] GFS2: Add hints to directory leaf blocks
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (12 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 13/24] GFS2: For exhash conversion, only one block is needed Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 15/24] GFS2: Add initialization for address space in super block Steven Whitehouse
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

This patch adds four new fields to directory leaf blocks.
The intent is not to use them in the kernel itself, although
perhaps we may be able to use them as hints at some later date,
but instead to provide more information for debug/fsck use.

One new field adds a pointer to the inode to which the leaf
belongs. This can be useful if the pointer to the leaf block
has become corrupt, as it will allow us to know which inode
this block should be associated with. This field is set when
the leaf is created and never changed over its lifetime.

The second field is a "distance from the hash table" field.
The meaning is as follows:
 0  = An old leaf in which this value has not been set
 1  = This leaf is pointed to directly from the hash table
 2+ = This leaf is part of a chain, pointed to by another leaf
      block, the value gives the position in the chain.

The third and fourth fields combine to give a time stamp of
the most recent directory insertion or deletion from this
leaf block. The time stamp is not updated when a new leaf
block is chained from the current one. The code is currently
written such that the timestamp on the dir inode will match
that of the leaf block for the most recent insertion/deletion.

For backwards compatibility, any of these new fields which is
zero should be considered to be "unknown".

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index b2e5ebf..fa32655 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -834,6 +834,7 @@ static struct gfs2_leaf *new_leaf(struct inode *inode, struct buffer_head **pbh,
 	struct gfs2_leaf *leaf;
 	struct gfs2_dirent *dent;
 	struct qstr name = { .name = "" };
+	struct timespec tv = CURRENT_TIME;
 
 	error = gfs2_alloc_blocks(ip, &bn, &n, 0, NULL);
 	if (error)
@@ -850,7 +851,11 @@ static struct gfs2_leaf *new_leaf(struct inode *inode, struct buffer_head **pbh,
 	leaf->lf_entries = 0;
 	leaf->lf_dirent_format = cpu_to_be32(GFS2_FORMAT_DE);
 	leaf->lf_next = 0;
-	memset(leaf->lf_reserved, 0, sizeof(leaf->lf_reserved));
+	leaf->lf_inode = cpu_to_be64(ip->i_no_addr);
+	leaf->lf_dist = cpu_to_be32(1);
+	leaf->lf_nsec = cpu_to_be32(tv.tv_nsec);
+	leaf->lf_sec = cpu_to_be64(tv.tv_sec);
+	memset(leaf->lf_reserved2, 0, sizeof(leaf->lf_reserved2));
 	dent = (struct gfs2_dirent *)(leaf+1);
 	gfs2_qstr2dirent(&name, bh->b_size - sizeof(struct gfs2_leaf), dent);
 	*pbh = bh;
@@ -1612,11 +1617,31 @@ out:
 	return ret;
 }
 
+/**
+ * dir_new_leaf - Add a new leaf onto hash chain
+ * @inode: The directory
+ * @name: The name we are adding
+ *
+ * This adds a new dir leaf onto an existing leaf when there is not
+ * enough space to add a new dir entry. This is a last resort after
+ * we've expanded the hash table to max size and also split existing
+ * leaf blocks, so it will only occur for very large directories.
+ *
+ * The dist parameter is set to 1 for leaf blocks directly attached
+ * to the hash table, 2 for one layer of indirection, 3 for two layers
+ * etc. We are thus able to tell the difference between an old leaf
+ * with dist set to zero (i.e. "don't know") and a new one where we
+ * set this information for debug/fsck purposes.
+ *
+ * Returns: 0 on success, or -ve on error
+ */
+
 static int dir_new_leaf(struct inode *inode, const struct qstr *name)
 {
 	struct buffer_head *bh, *obh;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_leaf *leaf, *oleaf;
+	u32 dist = 1;
 	int error;
 	u32 index;
 	u64 bn;
@@ -1626,6 +1651,7 @@ static int dir_new_leaf(struct inode *inode, const struct qstr *name)
 	if (error)
 		return error;
 	do {
+		dist++;
 		oleaf = (struct gfs2_leaf *)obh->b_data;
 		bn = be64_to_cpu(oleaf->lf_next);
 		if (!bn)
@@ -1643,6 +1669,7 @@ static int dir_new_leaf(struct inode *inode, const struct qstr *name)
 		brelse(obh);
 		return -ENOSPC;
 	}
+	leaf->lf_dist = cpu_to_be32(dist);
 	oleaf->lf_next = cpu_to_be64(bh->b_blocknr);
 	brelse(bh);
 	brelse(obh);
@@ -1679,6 +1706,7 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct buffer_head *bh = da->bh;
 	struct gfs2_dirent *dent = da->dent;
+	struct timespec tv;
 	struct gfs2_leaf *leaf;
 	int error;
 
@@ -1693,15 +1721,18 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name,
 			dent = gfs2_init_dirent(inode, dent, name, bh);
 			gfs2_inum_out(nip, dent);
 			dent->de_type = cpu_to_be16(IF2DT(nip->i_inode.i_mode));
+			tv = CURRENT_TIME;
 			if (ip->i_diskflags & GFS2_DIF_EXHASH) {
 				leaf = (struct gfs2_leaf *)bh->b_data;
 				be16_add_cpu(&leaf->lf_entries, 1);
+				leaf->lf_nsec = cpu_to_be32(tv.tv_nsec);
+				leaf->lf_sec = cpu_to_be64(tv.tv_sec);
 			}
 			da->dent = NULL;
 			da->bh = NULL;
 			brelse(bh);
 			ip->i_entries++;
-			ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
+			ip->i_inode.i_mtime = ip->i_inode.i_ctime = tv;
 			if (S_ISDIR(nip->i_inode.i_mode))
 				inc_nlink(&ip->i_inode);
 			mark_inode_dirty(inode);
@@ -1752,6 +1783,7 @@ int gfs2_dir_del(struct gfs2_inode *dip, const struct dentry *dentry)
 	const struct qstr *name = &dentry->d_name;
 	struct gfs2_dirent *dent, *prev = NULL;
 	struct buffer_head *bh;
+	struct timespec tv = CURRENT_TIME;
 
 	/* Returns _either_ the entry (if its first in block) or the
 	   previous entry otherwise */
@@ -1777,13 +1809,15 @@ int gfs2_dir_del(struct gfs2_inode *dip, const struct dentry *dentry)
 		if (!entries)
 			gfs2_consist_inode(dip);
 		leaf->lf_entries = cpu_to_be16(--entries);
+		leaf->lf_nsec = cpu_to_be32(tv.tv_nsec);
+		leaf->lf_sec = cpu_to_be64(tv.tv_sec);
 	}
 	brelse(bh);
 
 	if (!dip->i_entries)
 		gfs2_consist_inode(dip);
 	dip->i_entries--;
-	dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME;
+	dip->i_inode.i_mtime = dip->i_inode.i_ctime = tv;
 	if (S_ISDIR(dentry->d_inode->i_mode))
 		drop_nlink(&dip->i_inode);
 	mark_inode_dirty(&dip->i_inode);
diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h
index b2de1f9..0f24c07 100644
--- a/include/uapi/linux/gfs2_ondisk.h
+++ b/include/uapi/linux/gfs2_ondisk.h
@@ -319,7 +319,16 @@ struct gfs2_leaf {
 	__be32 lf_dirent_format;	/* Format of the dirents */
 	__be64 lf_next;			/* Next leaf, if overflow */
 
-	__u8 lf_reserved[64];
+	union {
+		__u8 lf_reserved[64];
+		struct {
+			__be64 lf_inode;	/* Dir inode number */
+			__be32 lf_dist;		/* Dist from inode on chain */
+			__be32 lf_nsec;		/* Last ins/del usecs */
+			__be64 lf_sec;		/* Last ins/del in secs */
+			__u8 lf_reserved2[40];
+		};
+	};
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/24] GFS2: Add initialization for address space in super block
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (13 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 14/24] GFS2: Add hints to directory leaf blocks Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 16/24] GFS2: No need to invalidate pages for a dio read Steven Whitehouse
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse, Andrew Price

Spotted by Andy Price. This should fix the odd messages from
lockdep caused by 70d4ee94b370c5ef54d0870600f16bd92d18013c

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Price <anprice@redhat.com>

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 94aab31..3b820b6 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -102,6 +102,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	mapping = &sdp->sd_aspace;
 
+	address_space_init_once(mapping);
 	mapping->a_ops = &gfs2_meta_aops;
 	mapping->host = sb->s_bdev->bd_inode;
 	mapping->flags = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/24] GFS2: No need to invalidate pages for a dio read
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (14 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 15/24] GFS2: Add initialization for address space in super block Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas Steven Whitehouse
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

We recently fixed the writeback of pages prior to performing
direct i/o, however the initial fix was perhaps a bit heavy
handed. There is no need to invalidate pages if the direct i/o
is only a read, since they will be identical to what has been
flushed to disk anyway.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index cf858da..49436fa 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1032,8 +1032,9 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 			unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
 		rv = filemap_write_and_wait_range(mapping, lstart, end);
 		if (rv)
-			return rv;
-		truncate_inode_pages_range(mapping, lstart, end);
+			goto out;
+		if (rw == WRITE)
+			truncate_inode_pages_range(mapping, lstart, end);
 	}
 
 	rv = __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (15 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 16/24] GFS2: No need to invalidate pages for a dio read Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-22  5:32   ` Paul E. McKenney
  2014-01-20 12:23 ` [PATCH 18/24] GFS2: Only run logd and quota when mounted read/write Steven Whitehouse
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel
  Cc: Steven Whitehouse, Abhijith Das, Paul E. McKenney

Prior to this patch, GFS2 kept all the quotas for each
super block in a single linked list. This is rather slow
when there are large numbers of quotas.

This patch introduces a hlist_bl based hash table, similar
to the one used for glocks. The initial look up of the quota
is now lockless in the case where it is already cached,
although we still have to take the per quota spinlock in
order to bump the ref count. Either way though, this is a
big improvement on what was there before.

The qd_lock and the per super block list is preserved, for
the time being. However it is intended that since this is no
longer used for its original role, it should be possible to
shrink the number of items on that list in due course and
remove the requirement to take qd_lock in qd_get.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a99f60c..59d99ec 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -428,10 +428,13 @@ enum {
 };
 
 struct gfs2_quota_data {
+	struct hlist_bl_node qd_hlist;
 	struct list_head qd_list;
 	struct kqid qd_id;
+	struct gfs2_sbd *qd_sbd;
 	struct lockref qd_lockref;
 	struct list_head qd_lru;
+	unsigned qd_hash;
 
 	unsigned long qd_flags;		/* QDF_... */
 
@@ -450,6 +453,7 @@ struct gfs2_quota_data {
 
 	u64 qd_sync_gen;
 	unsigned long qd_last_warn;
+	struct rcu_head qd_rcu;
 };
 
 struct gfs2_trans {
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 0650db2..c272e73 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -76,6 +76,7 @@ static int __init init_gfs2_fs(void)
 
 	gfs2_str2qstr(&gfs2_qdot, ".");
 	gfs2_str2qstr(&gfs2_qdotdot, "..");
+	gfs2_quota_hash_init();
 
 	error = gfs2_sys_init();
 	if (error)
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 1b6b367..a1df01d 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -52,6 +52,10 @@
 #include <linux/dqblk_xfs.h>
 #include <linux/lockref.h>
 #include <linux/list_lru.h>
+#include <linux/rcupdate.h>
+#include <linux/rculist_bl.h>
+#include <linux/bit_spinlock.h>
+#include <linux/jhash.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -67,10 +71,43 @@
 #include "inode.h"
 #include "util.h"
 
-/* Lock order: qd_lock -> qd->lockref.lock -> lru lock */
+#define GFS2_QD_HASH_SHIFT      12
+#define GFS2_QD_HASH_SIZE       (1 << GFS2_QD_HASH_SHIFT)
+#define GFS2_QD_HASH_MASK       (GFS2_QD_HASH_SIZE - 1)
+
+/* Lock order: qd_lock -> bucket lock -> qd->lockref.lock -> lru lock */
 static DEFINE_SPINLOCK(qd_lock);
 struct list_lru gfs2_qd_lru;
 
+static struct hlist_bl_head qd_hash_table[GFS2_QD_HASH_SIZE];
+
+static unsigned int gfs2_qd_hash(const struct gfs2_sbd *sdp,
+				 const struct kqid qid)
+{
+	unsigned int h;
+
+	h = jhash(&sdp, sizeof(struct gfs2_sbd *), 0);
+	h = jhash(&qid, sizeof(struct kqid), h);
+
+	return h & GFS2_QD_HASH_MASK;
+}
+
+static inline void spin_lock_bucket(unsigned int hash)
+{
+        hlist_bl_lock(&qd_hash_table[hash]);
+}
+
+static inline void spin_unlock_bucket(unsigned int hash)
+{
+        hlist_bl_unlock(&qd_hash_table[hash]);
+}
+
+static void gfs2_qd_dealloc(struct rcu_head *rcu)
+{
+	struct gfs2_quota_data *qd = container_of(rcu, struct gfs2_quota_data, qd_rcu);
+	kmem_cache_free(gfs2_quotad_cachep, qd);
+}
+
 static void gfs2_qd_dispose(struct list_head *list)
 {
 	struct gfs2_quota_data *qd;
@@ -87,6 +124,10 @@ static void gfs2_qd_dispose(struct list_head *list)
 		list_del(&qd->qd_list);
 		spin_unlock(&qd_lock);
 
+		spin_lock_bucket(qd->qd_hash);
+		hlist_bl_del_rcu(&qd->qd_hlist);
+		spin_unlock_bucket(qd->qd_hash);
+
 		gfs2_assert_warn(sdp, !qd->qd_change);
 		gfs2_assert_warn(sdp, !qd->qd_slot_count);
 		gfs2_assert_warn(sdp, !qd->qd_bh_count);
@@ -95,7 +136,7 @@ static void gfs2_qd_dispose(struct list_head *list)
 		atomic_dec(&sdp->sd_quota_count);
 
 		/* Delete it from the common reclaim list */
-		kmem_cache_free(gfs2_quotad_cachep, qd);
+		call_rcu(&qd->qd_rcu, gfs2_qd_dealloc);
 	}
 }
 
@@ -165,83 +206,95 @@ static u64 qd2offset(struct gfs2_quota_data *qd)
 	return offset;
 }
 
-static int qd_alloc(struct gfs2_sbd *sdp, struct kqid qid,
-		    struct gfs2_quota_data **qdp)
+static struct gfs2_quota_data *qd_alloc(unsigned hash, struct gfs2_sbd *sdp, struct kqid qid)
 {
 	struct gfs2_quota_data *qd;
 	int error;
 
 	qd = kmem_cache_zalloc(gfs2_quotad_cachep, GFP_NOFS);
 	if (!qd)
-		return -ENOMEM;
+		return NULL;
 
+	qd->qd_sbd = sdp;
 	qd->qd_lockref.count = 1;
 	spin_lock_init(&qd->qd_lockref.lock);
 	qd->qd_id = qid;
 	qd->qd_slot = -1;
 	INIT_LIST_HEAD(&qd->qd_lru);
+	qd->qd_hash = hash;
 
 	error = gfs2_glock_get(sdp, qd2index(qd),
 			      &gfs2_quota_glops, CREATE, &qd->qd_gl);
 	if (error)
 		goto fail;
 
-	*qdp = qd;
-
-	return 0;
+	return qd;
 
 fail:
 	kmem_cache_free(gfs2_quotad_cachep, qd);
-	return error;
+	return NULL;
 }
 
-static int qd_get(struct gfs2_sbd *sdp, struct kqid qid,
-		  struct gfs2_quota_data **qdp)
+static struct gfs2_quota_data *gfs2_qd_search_bucket(unsigned int hash,
+						     const struct gfs2_sbd *sdp,
+						     struct kqid qid)
 {
-	struct gfs2_quota_data *qd = NULL, *new_qd = NULL;
-	int error, found;
-
-	*qdp = NULL;
+	struct gfs2_quota_data *qd;
+	struct hlist_bl_node *h;
 
-	for (;;) {
-		found = 0;
-		spin_lock(&qd_lock);
-		list_for_each_entry(qd, &sdp->sd_quota_list, qd_list) {
-			if (qid_eq(qd->qd_id, qid) &&
-			    lockref_get_not_dead(&qd->qd_lockref)) {
-				list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
-				found = 1;
-				break;
-			}
+	hlist_bl_for_each_entry_rcu(qd, h, &qd_hash_table[hash], qd_hlist) {
+		if (!qid_eq(qd->qd_id, qid))
+			continue;
+		if (qd->qd_sbd != sdp)
+			continue;
+		if (lockref_get_not_dead(&qd->qd_lockref)) {
+			list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
+			return qd;
 		}
+	}
 
-		if (!found)
-			qd = NULL;
+	return NULL;
+}
 
-		if (!qd && new_qd) {
-			qd = new_qd;
-			list_add(&qd->qd_list, &sdp->sd_quota_list);
-			atomic_inc(&sdp->sd_quota_count);
-			new_qd = NULL;
-		}
 
-		spin_unlock(&qd_lock);
+static int qd_get(struct gfs2_sbd *sdp, struct kqid qid,
+		  struct gfs2_quota_data **qdp)
+{
+	struct gfs2_quota_data *qd, *new_qd;
+	unsigned int hash = gfs2_qd_hash(sdp, qid);
 
-		if (qd) {
-			if (new_qd) {
-				gfs2_glock_put(new_qd->qd_gl);
-				kmem_cache_free(gfs2_quotad_cachep, new_qd);
-			}
-			*qdp = qd;
-			return 0;
-		}
+	rcu_read_lock();
+	*qdp = qd = gfs2_qd_search_bucket(hash, sdp, qid);
+	rcu_read_unlock();
 
-		error = qd_alloc(sdp, qid, &new_qd);
-		if (error)
-			return error;
+	if (qd)
+		return 0;
+
+	new_qd = qd_alloc(hash, sdp, qid);
+	if (!new_qd)
+		return -ENOMEM;
+
+	spin_lock(&qd_lock);
+	spin_lock_bucket(hash);
+	*qdp = qd = gfs2_qd_search_bucket(hash, sdp, qid);
+	if (qd == NULL) {
+		*qdp = new_qd;
+		list_add(&new_qd->qd_list, &sdp->sd_quota_list);
+		hlist_bl_add_head_rcu(&new_qd->qd_hlist, &qd_hash_table[hash]);
+		atomic_inc(&sdp->sd_quota_count);
+	}
+	spin_unlock_bucket(hash);
+	spin_unlock(&qd_lock);
+
+	if (qd) {
+		gfs2_glock_put(new_qd->qd_gl);
+		kmem_cache_free(gfs2_quotad_cachep, new_qd);
 	}
+
+	return 0;
 }
 
+
 static void qd_hold(struct gfs2_quota_data *qd)
 {
 	struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
@@ -1215,6 +1268,7 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 	unsigned int blocks = size >> sdp->sd_sb.sb_bsize_shift;
 	unsigned int x, slot = 0;
 	unsigned int found = 0;
+	unsigned int hash;
 	u64 dblock;
 	u32 extlen = 0;
 	int error;
@@ -1272,8 +1326,9 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 			if (!qc_change)
 				continue;
 
-			error = qd_alloc(sdp, qc_id, &qd);
-			if (error) {
+			hash = gfs2_qd_hash(sdp, qc_id);
+			qd = qd_alloc(hash, sdp, qc_id);
+			if (qd == NULL) {
 				brelse(bh);
 				goto fail;
 			}
@@ -1289,6 +1344,10 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 			atomic_inc(&sdp->sd_quota_count);
 			spin_unlock(&qd_lock);
 
+			spin_lock_bucket(hash);
+			hlist_bl_add_head_rcu(&qd->qd_hlist, &qd_hash_table[hash]);
+			spin_unlock_bucket(hash);
+
 			found++;
 		}
 
@@ -1335,11 +1394,16 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 		spin_unlock(&qd->qd_lockref.lock);
 
 		list_del(&qd->qd_list);
+
 		/* Also remove if this qd exists in the reclaim list */
 		list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
 		atomic_dec(&sdp->sd_quota_count);
 		spin_unlock(&qd_lock);
 
+		spin_lock_bucket(qd->qd_hash);
+		hlist_bl_del_rcu(&qd->qd_hlist);
+		spin_unlock_bucket(qd->qd_hash);
+
 		if (!qd->qd_lockref.count) {
 			gfs2_assert_warn(sdp, !qd->qd_change);
 			gfs2_assert_warn(sdp, !qd->qd_slot_count);
@@ -1348,7 +1412,7 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 		gfs2_assert_warn(sdp, !qd->qd_bh_count);
 
 		gfs2_glock_put(qd->qd_gl);
-		kmem_cache_free(gfs2_quotad_cachep, qd);
+		call_rcu(&qd->qd_rcu, gfs2_qd_dealloc);
 
 		spin_lock(&qd_lock);
 	}
@@ -1643,3 +1707,11 @@ const struct quotactl_ops gfs2_quotactl_ops = {
 	.get_dqblk	= gfs2_get_dqblk,
 	.set_dqblk	= gfs2_set_dqblk,
 };
+
+void __init gfs2_quota_hash_init(void)
+{
+	unsigned i;
+
+	for(i = 0; i < GFS2_QD_HASH_SIZE; i++)
+		INIT_HLIST_BL_HEAD(&qd_hash_table[i]);
+}
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 96e4f34..55d506e 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -57,5 +57,6 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip)
 extern const struct quotactl_ops gfs2_quotactl_ops;
 extern struct shrinker gfs2_qd_shrinker;
 extern struct list_lru gfs2_qd_lru;
+extern void __init gfs2_quota_hash_init(void);
 
 #endif /* __QUOTA_DOT_H__ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/24] GFS2: Only run logd and quota when mounted read/write
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (16 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 19/24] GFS2: Clean up quota slot allocation Steven Whitehouse
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse, Abhijith Das

While investigating a rather strange bit of code in the quota
clean up function, I spotted that the reason for its existence
was that when remounting read only, we were not stopping the
quotad thread, and thus it was possible for it to still have
a reference to some of the quotas in that case.

This patch moves the logd and quota thread start and stop into
the make_fs_rw/ro functions, so that we now stop those threads
when mounted read only.

This means that quotad will always be stopped before we call
the quota clean up function, and we can thus dispose of the
(rather hackish) code that waits for it to give up its
reference on the quotas.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 3b820b6..27648e8 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -969,40 +969,6 @@ fail:
 	return error;
 }
 
-static int init_threads(struct gfs2_sbd *sdp, int undo)
-{
-	struct task_struct *p;
-	int error = 0;
-
-	if (undo)
-		goto fail_quotad;
-
-	p = kthread_run(gfs2_logd, sdp, "gfs2_logd");
-	if (IS_ERR(p)) {
-		error = PTR_ERR(p);
-		fs_err(sdp, "can't start logd thread: %d\n", error);
-		return error;
-	}
-	sdp->sd_logd_process = p;
-
-	p = kthread_run(gfs2_quotad, sdp, "gfs2_quotad");
-	if (IS_ERR(p)) {
-		error = PTR_ERR(p);
-		fs_err(sdp, "can't start quotad thread: %d\n", error);
-		goto fail;
-	}
-	sdp->sd_quotad_process = p;
-
-	return 0;
-
-
-fail_quotad:
-	kthread_stop(sdp->sd_quotad_process);
-fail:
-	kthread_stop(sdp->sd_logd_process);
-	return error;
-}
-
 static const match_table_t nolock_tokens = {
 	{ Opt_jid, "jid=%d\n", },
 	{ Opt_err, NULL },
@@ -1267,15 +1233,11 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent
 		goto fail_per_node;
 	}
 
-	error = init_threads(sdp, DO);
-	if (error)
-		goto fail_per_node;
-
 	if (!(sb->s_flags & MS_RDONLY)) {
 		error = gfs2_make_fs_rw(sdp);
 		if (error) {
 			fs_err(sdp, "can't make FS RW: %d\n", error);
-			goto fail_threads;
+			goto fail_per_node;
 		}
 	}
 
@@ -1283,8 +1245,6 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent
 	gfs2_online_uevent(sdp);
 	return 0;
 
-fail_threads:
-	init_threads(sdp, UNDO);
 fail_per_node:
 	init_per_node(sdp, UNDO);
 fail_inodes:
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index a1df01d..3287d98 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1376,23 +1376,6 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 	while (!list_empty(head)) {
 		qd = list_entry(head->prev, struct gfs2_quota_data, qd_list);
 
-		/*
-		 * To be removed in due course... we should be able to
-		 * ensure that all refs to the qd have done by this point
-		 * so that this rather odd test is not required
-		 */
-		spin_lock(&qd->qd_lockref.lock);
-		if (qd->qd_lockref.count > 1 ||
-		    (qd->qd_lockref.count && !test_bit(QDF_CHANGE, &qd->qd_flags))) {
-			spin_unlock(&qd->qd_lockref.lock);
-			list_move(&qd->qd_list, head);
-			spin_unlock(&qd_lock);
-			schedule();
-			spin_lock(&qd_lock);
-			continue;
-		}
-		spin_unlock(&qd->qd_lockref.lock);
-
 		list_del(&qd->qd_list);
 
 		/* Also remove if this qd exists in the reclaim list */
@@ -1404,11 +1387,8 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 		hlist_bl_del_rcu(&qd->qd_hlist);
 		spin_unlock_bucket(qd->qd_hash);
 
-		if (!qd->qd_lockref.count) {
-			gfs2_assert_warn(sdp, !qd->qd_change);
-			gfs2_assert_warn(sdp, !qd->qd_slot_count);
-		} else
-			gfs2_assert_warn(sdp, qd->qd_slot_count == 1);
+		gfs2_assert_warn(sdp, !qd->qd_change);
+		gfs2_assert_warn(sdp, !qd->qd_slot_count);
 		gfs2_assert_warn(sdp, !qd->qd_bh_count);
 
 		gfs2_glock_put(qd->qd_gl);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 35da5b1..60f60f6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -369,6 +369,33 @@ int gfs2_jdesc_check(struct gfs2_jdesc *jd)
 	return 0;
 }
 
+static int init_threads(struct gfs2_sbd *sdp)
+{
+	struct task_struct *p;
+	int error = 0;
+
+	p = kthread_run(gfs2_logd, sdp, "gfs2_logd");
+	if (IS_ERR(p)) {
+		error = PTR_ERR(p);
+		fs_err(sdp, "can't start logd thread: %d\n", error);
+		return error;
+	}
+	sdp->sd_logd_process = p;
+
+	p = kthread_run(gfs2_quotad, sdp, "gfs2_quotad");
+	if (IS_ERR(p)) {
+		error = PTR_ERR(p);
+		fs_err(sdp, "can't start quotad thread: %d\n", error);
+		goto fail;
+	}
+	sdp->sd_quotad_process = p;
+	return 0;
+
+fail:
+	kthread_stop(sdp->sd_logd_process);
+	return error;
+}
+
 /**
  * gfs2_make_fs_rw - Turn a Read-Only FS into a Read-Write one
  * @sdp: the filesystem
@@ -384,10 +411,14 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 	struct gfs2_log_header_host head;
 	int error;
 
-	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, 0, &t_gh);
+	error = init_threads(sdp);
 	if (error)
 		return error;
 
+	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, 0, &t_gh);
+	if (error)
+		goto fail_threads;
+
 	j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);
 
 	error = gfs2_find_jhead(sdp->sd_jdesc, &head);
@@ -417,7 +448,9 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 fail:
 	t_gh.gh_flags |= GL_NOCACHE;
 	gfs2_glock_dq_uninit(&t_gh);
-
+fail_threads:
+	kthread_stop(sdp->sd_quotad_process);
+	kthread_stop(sdp->sd_logd_process);
 	return error;
 }
 
@@ -800,6 +833,9 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	struct gfs2_holder t_gh;
 	int error;
 
+	kthread_stop(sdp->sd_quotad_process);
+	kthread_stop(sdp->sd_logd_process);
+
 	flush_workqueue(gfs2_delete_workqueue);
 	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
@@ -857,9 +893,6 @@ restart:
 	}
 	spin_unlock(&sdp->sd_jindex_spin);
 
-	kthread_stop(sdp->sd_quotad_process);
-	kthread_stop(sdp->sd_logd_process);
-
 	if (!(sb->s_flags & MS_RDONLY)) {
 		error = gfs2_make_fs_ro(sdp);
 		if (error)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/24] GFS2: Clean up quota slot allocation
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (17 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 18/24] GFS2: Only run logd and quota when mounted read/write Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 20/24] GFS2: Move quota bitmap operations under their own lock Steven Whitehouse
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse, Abhijith Das

Quota slot allocation has historically used a vector of pages
and a set of homegrown find/test/set/clear bit functions. Since
the size of the bitmap is likely to be based on the default
qc file size, thats a couple of pages at most. So we ought
to be able to allocate that as a single chunk, with a vmalloc
fallback, just in case of memory fragmentation.

We are then able to use the kernel's own find/test/set/clear
bit functions, rather than rolling our own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 59d99ec..4b9aa5b 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -733,8 +733,7 @@ struct gfs2_sbd {
 	spinlock_t sd_trunc_lock;
 
 	unsigned int sd_quota_slots;
-	unsigned int sd_quota_chunks;
-	unsigned char **sd_quota_bitmap;
+	unsigned long *sd_quota_bitmap;
 
 	u64 sd_quota_sync_gen;
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 3287d98..79be67a 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -315,50 +315,30 @@ static void qd_put(struct gfs2_quota_data *qd)
 
 static int slot_get(struct gfs2_quota_data *qd)
 {
-	struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
-	unsigned int c, o = 0, b;
-	unsigned char byte = 0;
+	struct gfs2_sbd *sdp = qd->qd_sbd;
+	unsigned int bit;
+	int error = 0;
 
 	spin_lock(&qd_lock);
+	if (qd->qd_slot_count != 0)
+		goto out;
 
-	if (qd->qd_slot_count++) {
-		spin_unlock(&qd_lock);
-		return 0;
+	error = -ENOSPC;
+	bit = find_first_zero_bit(sdp->sd_quota_bitmap, sdp->sd_quota_slots);
+	if (bit < sdp->sd_quota_slots) {
+		set_bit(bit, sdp->sd_quota_bitmap);
+		qd->qd_slot = bit;
+out:
+		qd->qd_slot_count++;
 	}
-
-	for (c = 0; c < sdp->sd_quota_chunks; c++)
-		for (o = 0; o < PAGE_SIZE; o++) {
-			byte = sdp->sd_quota_bitmap[c][o];
-			if (byte != 0xFF)
-				goto found;
-		}
-
-	goto fail;
-
-found:
-	for (b = 0; b < 8; b++)
-		if (!(byte & (1 << b)))
-			break;
-	qd->qd_slot = c * (8 * PAGE_SIZE) + o * 8 + b;
-
-	if (qd->qd_slot >= sdp->sd_quota_slots)
-		goto fail;
-
-	sdp->sd_quota_bitmap[c][o] |= 1 << b;
-
 	spin_unlock(&qd_lock);
 
-	return 0;
-
-fail:
-	qd->qd_slot_count--;
-	spin_unlock(&qd_lock);
-	return -ENOSPC;
+	return error;
 }
 
 static void slot_hold(struct gfs2_quota_data *qd)
 {
-	struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
+	struct gfs2_sbd *sdp = qd->qd_sbd;
 
 	spin_lock(&qd_lock);
 	gfs2_assert(sdp, qd->qd_slot_count);
@@ -366,34 +346,14 @@ static void slot_hold(struct gfs2_quota_data *qd)
 	spin_unlock(&qd_lock);
 }
 
-static void gfs2_icbit_munge(struct gfs2_sbd *sdp, unsigned char **bitmap,
-			     unsigned int bit, int new_value)
-{
-	unsigned int c, o, b = bit;
-	int old_value;
-
-	c = b / (8 * PAGE_SIZE);
-	b %= 8 * PAGE_SIZE;
-	o = b / 8;
-	b %= 8;
-
-	old_value = (bitmap[c][o] & (1 << b));
-	gfs2_assert_withdraw(sdp, !old_value != !new_value);
-
-	if (new_value)
-		bitmap[c][o] |= 1 << b;
-	else
-		bitmap[c][o] &= ~(1 << b);
-}
-
 static void slot_put(struct gfs2_quota_data *qd)
 {
-	struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
+	struct gfs2_sbd *sdp = qd->qd_sbd;
 
 	spin_lock(&qd_lock);
 	gfs2_assert(sdp, qd->qd_slot_count);
 	if (!--qd->qd_slot_count) {
-		gfs2_icbit_munge(sdp, sdp->sd_quota_bitmap, qd->qd_slot, 0);
+		BUG_ON(!test_and_clear_bit(qd->qd_slot, sdp->sd_quota_bitmap));
 		qd->qd_slot = -1;
 	}
 	spin_unlock(&qd_lock);
@@ -1269,6 +1229,7 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 	unsigned int x, slot = 0;
 	unsigned int found = 0;
 	unsigned int hash;
+	unsigned int bm_size;
 	u64 dblock;
 	u32 extlen = 0;
 	int error;
@@ -1277,20 +1238,16 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 		return -EIO;
 
 	sdp->sd_quota_slots = blocks * sdp->sd_qc_per_block;
-	sdp->sd_quota_chunks = DIV_ROUND_UP(sdp->sd_quota_slots, 8 * PAGE_SIZE);
-
+	bm_size = DIV_ROUND_UP(sdp->sd_quota_slots, 8 * sizeof(unsigned long));
+	bm_size *= sizeof(unsigned long);
 	error = -ENOMEM;
-
-	sdp->sd_quota_bitmap = kcalloc(sdp->sd_quota_chunks,
-				       sizeof(unsigned char *), GFP_NOFS);
+	sdp->sd_quota_bitmap = kmalloc(bm_size, GFP_NOFS|__GFP_NOWARN);
+	if (sdp->sd_quota_bitmap == NULL)
+		sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
 	if (!sdp->sd_quota_bitmap)
 		return error;
 
-	for (x = 0; x < sdp->sd_quota_chunks; x++) {
-		sdp->sd_quota_bitmap[x] = kzalloc(PAGE_SIZE, GFP_NOFS);
-		if (!sdp->sd_quota_bitmap[x])
-			goto fail;
-	}
+	memset(sdp->sd_quota_bitmap, 0, bm_size);
 
 	for (x = 0; x < blocks; x++) {
 		struct buffer_head *bh;
@@ -1339,7 +1296,7 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
 			qd->qd_slot_count = 1;
 
 			spin_lock(&qd_lock);
-			gfs2_icbit_munge(sdp, sdp->sd_quota_bitmap, slot, 1);
+			BUG_ON(test_and_set_bit(slot, sdp->sd_quota_bitmap));
 			list_add(&qd->qd_list, &sdp->sd_quota_list);
 			atomic_inc(&sdp->sd_quota_count);
 			spin_unlock(&qd_lock);
@@ -1370,7 +1327,6 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 {
 	struct list_head *head = &sdp->sd_quota_list;
 	struct gfs2_quota_data *qd;
-	unsigned int x;
 
 	spin_lock(&qd_lock);
 	while (!list_empty(head)) {
@@ -1401,9 +1357,11 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
 	gfs2_assert_warn(sdp, !atomic_read(&sdp->sd_quota_count));
 
 	if (sdp->sd_quota_bitmap) {
-		for (x = 0; x < sdp->sd_quota_chunks; x++)
-			kfree(sdp->sd_quota_bitmap[x]);
-		kfree(sdp->sd_quota_bitmap);
+		if (is_vmalloc_addr(sdp->sd_quota_bitmap))
+			vfree(sdp->sd_quota_bitmap);
+		else
+			kfree(sdp->sd_quota_bitmap);
+		sdp->sd_quota_bitmap = NULL;
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/24] GFS2: Move quota bitmap operations under their own lock
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (18 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 19/24] GFS2: Clean up quota slot allocation Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 21/24] GFS2: Fix kbuild test robot reported warning Steven Whitehouse
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse, Abhijith Das

Gradually, the global qd_lock is being used for less and less.
After this patch it will only be used for the per super block
list whose purpose is to allow syncing of changes back to the
master quota file from the local quota changes file. Fixing
up that process to make it more efficient will be the subject
of a later patch, however this patch removes another barrier
to doing that.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4b9aa5b..8c64e26 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -734,6 +734,7 @@ struct gfs2_sbd {
 
 	unsigned int sd_quota_slots;
 	unsigned long *sd_quota_bitmap;
+	spinlock_t sd_bitmap_lock;
 
 	u64 sd_quota_sync_gen;
 
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 27648e8..06a66c9 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -99,6 +99,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 	init_waitqueue_head(&sdp->sd_quota_wait);
 	INIT_LIST_HEAD(&sdp->sd_trunc_list);
 	spin_lock_init(&sdp->sd_trunc_lock);
+	spin_lock_init(&sdp->sd_bitmap_lock);
 
 	mapping = &sdp->sd_aspace;
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 79be67a..02a2740 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -76,6 +76,7 @@
 #define GFS2_QD_HASH_MASK       (GFS2_QD_HASH_SIZE - 1)
 
 /* Lock order: qd_lock -> bucket lock -> qd->lockref.lock -> lru lock */
+/*                     -> sd_bitmap_lock                              */
 static DEFINE_SPINLOCK(qd_lock);
 struct list_lru gfs2_qd_lru;
 
@@ -319,7 +320,7 @@ static int slot_get(struct gfs2_quota_data *qd)
 	unsigned int bit;
 	int error = 0;
 
-	spin_lock(&qd_lock);
+	spin_lock(&sdp->sd_bitmap_lock);
 	if (qd->qd_slot_count != 0)
 		goto out;
 
@@ -331,7 +332,7 @@ static int slot_get(struct gfs2_quota_data *qd)
 out:
 		qd->qd_slot_count++;
 	}
-	spin_unlock(&qd_lock);
+	spin_unlock(&sdp->sd_bitmap_lock);
 
 	return error;
 }
@@ -340,23 +341,23 @@ static void slot_hold(struct gfs2_quota_data *qd)
 {
 	struct gfs2_sbd *sdp = qd->qd_sbd;
 
-	spin_lock(&qd_lock);
+	spin_lock(&sdp->sd_bitmap_lock);
 	gfs2_assert(sdp, qd->qd_slot_count);
 	qd->qd_slot_count++;
-	spin_unlock(&qd_lock);
+	spin_unlock(&sdp->sd_bitmap_lock);
 }
 
 static void slot_put(struct gfs2_quota_data *qd)
 {
 	struct gfs2_sbd *sdp = qd->qd_sbd;
 
-	spin_lock(&qd_lock);
+	spin_lock(&sdp->sd_bitmap_lock);
 	gfs2_assert(sdp, qd->qd_slot_count);
 	if (!--qd->qd_slot_count) {
 		BUG_ON(!test_and_clear_bit(qd->qd_slot, sdp->sd_quota_bitmap));
 		qd->qd_slot = -1;
 	}
-	spin_unlock(&qd_lock);
+	spin_unlock(&sdp->sd_bitmap_lock);
 }
 
 static int bh_get(struct gfs2_quota_data *qd)
@@ -434,8 +435,7 @@ static int qd_check_sync(struct gfs2_sbd *sdp, struct gfs2_quota_data *qd,
 	list_move_tail(&qd->qd_list, &sdp->sd_quota_list);
 	set_bit(QDF_LOCKED, &qd->qd_flags);
 	qd->qd_change_sync = qd->qd_change;
-	gfs2_assert_warn(sdp, qd->qd_slot_count);
-	qd->qd_slot_count++;
+	slot_hold(qd);
 	return 1;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/24] GFS2: Fix kbuild test robot reported warning
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (19 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 20/24] GFS2: Move quota bitmap operations under their own lock Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 22/24] GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Steven Whitehouse
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse

Well I don't get the same warning locally as the kbuild
robot, but I guess this should fix the problem, anyway.
Here is the warning:

head:   2d9e72303d538024627fb1fe2cbde48aec12acc0
commit: ee2411a8db49a21bc55dc124e1b434ba194c8903 [19/20] GFS2: Clean up quota slot allocation
config: make ARCH=powerpc allmodconfig

All error/warnings:

   fs/gfs2/quota.c: In function 'gfs2_quota_init':
>> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration]
      sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
      ^
>> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default]
      sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
                           ^
   fs/gfs2/quota.c: In function 'gfs2_quota_cleanup':
>> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
       vfree(sdp->sd_quota_bitmap);

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 02a2740..8bec0e31 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -56,6 +56,7 @@
 #include <linux/rculist_bl.h>
 #include <linux/bit_spinlock.h>
 #include <linux/jhash.h>
+#include <linux/vmalloc.h>
 
 #include "gfs2.h"
 #include "incore.h"
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/24] GFS2: Don't use ENOBUFS when ENOMEM is the correct error code
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (20 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 21/24] GFS2: Fix kbuild test robot reported warning Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 23/24] GFS2: Small cleanup Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 24/24] GFS2: revert "GFS2: d_splice_alias() can't return error" Steven Whitehouse
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Steven Whitehouse, Al Viro

Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.

>        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6f7a47c..ca0be6c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1552,13 +1552,11 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp)
 	glock_hash_walk(thaw_glock, sdp);
 }
 
-static int dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
+static void dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
 {
-	int ret;
 	spin_lock(&gl->gl_spin);
-	ret = gfs2_dump_glock(seq, gl);
+	gfs2_dump_glock(seq, gl);
 	spin_unlock(&gl->gl_spin);
-	return ret;
 }
 
 static void dump_glock_func(struct gfs2_glock *gl)
@@ -1647,10 +1645,9 @@ static const char *hflags2str(char *buf, unsigned flags, unsigned long iflags)
  * @seq: the seq_file struct
  * @gh: the glock holder
  *
- * Returns: 0 on success, -ENOBUFS when we run out of space
  */
 
-static int dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
+static void dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
 {
 	struct task_struct *gh_owner = NULL;
 	char flags_buf[32];
@@ -1666,7 +1663,6 @@ static int dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
 		       gh_owner ? gh_owner->comm : "(ended)",
 		       (void *)gh->gh_ip);
 	rcu_read_unlock();
-	return 0;
 }
 
 static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
@@ -1721,16 +1717,14 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
  * example. The field's are n = number (id of the object), f = flags,
  * t = type, s = state, r = refcount, e = error, p = pid.
  *
- * Returns: 0 on success, -ENOBUFS when we run out of space
  */
 
-int gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl)
+void gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl)
 {
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	unsigned long long dtime;
 	const struct gfs2_holder *gh;
 	char gflags_buf[32];
-	int error = 0;
 
 	dtime = jiffies - gl->gl_demote_time;
 	dtime *= 1000000/HZ; /* demote time in uSec */
@@ -1747,15 +1741,11 @@ int gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl)
 		  atomic_read(&gl->gl_revokes),
 		  (int)gl->gl_lockref.count, gl->gl_hold_time);
 
-	list_for_each_entry(gh, &gl->gl_holders, gh_list) {
-		error = dump_holder(seq, gh);
-		if (error)
-			goto out;
-	}
+	list_for_each_entry(gh, &gl->gl_holders, gh_list)
+		dump_holder(seq, gh);
+
 	if (gl->gl_state != LM_ST_UNLOCKED && glops->go_dump)
-		error = glops->go_dump(seq, gl);
-out:
-	return error;
+		glops->go_dump(seq, gl);
 }
 
 static int gfs2_glstats_seq_show(struct seq_file *seq, void *iter_ptr)
@@ -1953,7 +1943,8 @@ static void gfs2_glock_seq_stop(struct seq_file *seq, void *iter_ptr)
 
 static int gfs2_glock_seq_show(struct seq_file *seq, void *iter_ptr)
 {
-	return dump_glock(seq, iter_ptr);
+	dump_glock(seq, iter_ptr);
+	return 0;
 }
 
 static void *gfs2_sbstats_seq_start(struct seq_file *seq, loff_t *pos)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 6647d77..32572f7 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -199,7 +199,7 @@ extern int gfs2_glock_nq_num(struct gfs2_sbd *sdp, u64 number,
 			     struct gfs2_holder *gh);
 extern int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 extern void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
-extern int gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl);
+extern void gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { gfs2_dump_glock(NULL, gl); BUG(); } } while(0)
 extern __printf(2, 3)
 void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index b792dbc..3bf0631 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -437,21 +437,19 @@ static int inode_go_lock(struct gfs2_holder *gh)
  * @seq: The iterator
  * @ip: the inode
  *
- * Returns: 0 on success, -ENOBUFS when we run out of space
  */
 
-static int inode_go_dump(struct seq_file *seq, const struct gfs2_glock *gl)
+static void inode_go_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 {
 	const struct gfs2_inode *ip = gl->gl_object;
 	if (ip == NULL)
-		return 0;
+		return;
 	gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu\n",
 		  (unsigned long long)ip->i_no_formal_ino,
 		  (unsigned long long)ip->i_no_addr,
 		  IF2DT(ip->i_inode.i_mode), ip->i_flags,
 		  (unsigned int)ip->i_diskflags,
 		  (unsigned long long)i_size_read(&ip->i_inode));
-	return 0;
 }
 
 /**
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 8c64e26..cf0e344 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -218,7 +218,7 @@ struct gfs2_glock_operations {
 	int (*go_demote_ok) (const struct gfs2_glock *gl);
 	int (*go_lock) (struct gfs2_holder *gh);
 	void (*go_unlock) (struct gfs2_holder *gh);
-	int (*go_dump)(struct seq_file *seq, const struct gfs2_glock *gl);
+	void (*go_dump)(struct seq_file *seq, const struct gfs2_glock *gl);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
 	const int go_type;
 	const unsigned long go_flags;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 51cf10d..b8956f2 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -149,7 +149,7 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 	ip = GFS2_I(inode);
 
 	if (!inode)
-		return ERR_PTR(-ENOBUFS);
+		return ERR_PTR(-ENOMEM);
 
 	if (inode->i_state & I_NEW) {
 		struct gfs2_sbd *sdp = GFS2_SB(inode);
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 06a66c9..1e712b5 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -231,7 +231,7 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
 
 	page = alloc_page(GFP_NOFS);
 	if (unlikely(!page))
-		return -ENOBUFS;
+		return -ENOMEM;
 
 	ClearPageUptodate(page);
 	ClearPageDirty(page);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 183cf0f..e14e887 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2120,14 +2120,14 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart,
  *
  */
 
-int gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
+void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 {
 	struct gfs2_rgrpd *rgd = gl->gl_object;
 	struct gfs2_blkreserv *trs;
 	const struct rb_node *n;
 
 	if (rgd == NULL)
-		return 0;
+		return;
 	gfs2_print_dbg(seq, " R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
 		       (unsigned long long)rgd->rd_addr, rgd->rd_flags,
 		       rgd->rd_free, rgd->rd_free_clone, rgd->rd_dinodes,
@@ -2138,7 +2138,6 @@ int gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl)
 		dump_rs(seq, trs);
 	}
 	spin_unlock(&rgd->rd_rsspin);
-	return 0;
 }
 
 static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 3a10d2f..463ab2e 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -68,7 +68,7 @@ extern void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 extern void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state);
 extern void gfs2_rlist_free(struct gfs2_rgrp_list *rlist);
 extern u64 gfs2_ri_total(struct gfs2_sbd *sdp);
-extern int gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl);
+extern void gfs2_rgrp_dump(struct seq_file *seq, const struct gfs2_glock *gl);
 extern int gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset,
 				   struct buffer_head *bh,
 				   const struct gfs2_bitmap *bi, unsigned minlen, u64 *ptrimmed);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 23/24] GFS2: Small cleanup
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (21 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 22/24] GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  2014-01-20 12:23 ` [PATCH 24/24] GFS2: revert "GFS2: d_splice_alias() can't return error" Steven Whitehouse
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel; +Cc: Bob Peterson, Steven Whitehouse

From: Bob Peterson <rpeterso@redhat.com>

This is a small cleanup to function gfs2_rgrp_go_lock so that it
uses rgd instead of its more complicated twin.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index e14e887..a1da213 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1199,7 +1199,7 @@ int gfs2_rgrp_go_lock(struct gfs2_holder *gh)
 
 	if (gh->gh_flags & GL_SKIP && sdp->sd_args.ar_rgrplvb)
 		return 0;
-	return gfs2_rgrp_bh_get((struct gfs2_rgrpd *)gh->gh_gl->gl_object);
+	return gfs2_rgrp_bh_get(rgd);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 24/24] GFS2: revert "GFS2: d_splice_alias() can't return error"
  2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
                   ` (22 preceding siblings ...)
  2014-01-20 12:23 ` [PATCH 23/24] GFS2: Small cleanup Steven Whitehouse
@ 2014-01-20 12:23 ` Steven Whitehouse
  23 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-20 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel
  Cc: J. Bruce Fields, J. Bruce Fields, Steven Whitehouse

From: "J. Bruce Fields" <bfields@fieldses.org>

0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias()
can't return error unless it was given an IS_ERR(inode)".

That was true of the implementation of d_splice_alias, but this is
really a problem with d_splice_alias: at a minimum it should be able to
return -ELOOP in the case where inserting the given dentry would cause a
directory loop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index b8956f2..890588c 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -604,6 +604,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	error = PTR_ERR(inode);
 	if (!IS_ERR(inode)) {
 		d = d_splice_alias(inode, dentry);
+		error = PTR_ERR(d);
+		if (IS_ERR(d))
+			goto fail_gunlock;
 		error = 0;
 		if (file) {
 			if (S_ISREG(inode->i_mode)) {
@@ -799,6 +802,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
 	}
 
 	d = d_splice_alias(inode, dentry);
+	if (IS_ERR(d)) {
+		iput(inode);
+		gfs2_glock_dq_uninit(&gh);
+		return d;
+	}
 	if (file && S_ISREG(inode->i_mode))
 		error = finish_open(file, dentry, gfs2_open_common, opened);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
  2014-01-20 12:23 ` [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas Steven Whitehouse
@ 2014-01-22  5:32   ` Paul E. McKenney
  2014-01-22  6:06     ` Sasha Levin
  2014-01-22  9:58     ` Steven Whitehouse
  0 siblings, 2 replies; 63+ messages in thread
From: Paul E. McKenney @ 2014-01-22  5:32 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-kernel, cluster-devel, Abhijith Das, sasha.levin

On Mon, Jan 20, 2014 at 12:23:40PM +0000, Steven Whitehouse wrote:
> Prior to this patch, GFS2 kept all the quotas for each
> super block in a single linked list. This is rather slow
> when there are large numbers of quotas.
> 
> This patch introduces a hlist_bl based hash table, similar
> to the one used for glocks. The initial look up of the quota
> is now lockless in the case where it is already cached,
> although we still have to take the per quota spinlock in
> order to bump the ref count. Either way though, this is a
> big improvement on what was there before.
> 
> The qd_lock and the per super block list is preserved, for
> the time being. However it is intended that since this is no
> longer used for its original role, it should be possible to
> shrink the number of items on that list in due course and
> remove the requirement to take qd_lock in qd_get.
> 
> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
> Cc: Abhijith Das <adas@redhat.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Interesting!  I thought that Sasha Levin had a hash table in the works,
but I don't see it, so CCing him.

A few questions and comments below.

							Thanx, Paul

> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index a99f60c..59d99ec 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -428,10 +428,13 @@ enum {
>  };
> 
>  struct gfs2_quota_data {
> +	struct hlist_bl_node qd_hlist;
>  	struct list_head qd_list;
>  	struct kqid qd_id;
> +	struct gfs2_sbd *qd_sbd;
>  	struct lockref qd_lockref;
>  	struct list_head qd_lru;
> +	unsigned qd_hash;
> 
>  	unsigned long qd_flags;		/* QDF_... */
> 
> @@ -450,6 +453,7 @@ struct gfs2_quota_data {
> 
>  	u64 qd_sync_gen;
>  	unsigned long qd_last_warn;
> +	struct rcu_head qd_rcu;
>  };
> 
>  struct gfs2_trans {
> diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
> index 0650db2..c272e73 100644
> --- a/fs/gfs2/main.c
> +++ b/fs/gfs2/main.c
> @@ -76,6 +76,7 @@ static int __init init_gfs2_fs(void)
> 
>  	gfs2_str2qstr(&gfs2_qdot, ".");
>  	gfs2_str2qstr(&gfs2_qdotdot, "..");
> +	gfs2_quota_hash_init();
> 
>  	error = gfs2_sys_init();
>  	if (error)
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index 1b6b367..a1df01d 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -52,6 +52,10 @@
>  #include <linux/dqblk_xfs.h>
>  #include <linux/lockref.h>
>  #include <linux/list_lru.h>
> +#include <linux/rcupdate.h>
> +#include <linux/rculist_bl.h>
> +#include <linux/bit_spinlock.h>
> +#include <linux/jhash.h>
> 
>  #include "gfs2.h"
>  #include "incore.h"
> @@ -67,10 +71,43 @@
>  #include "inode.h"
>  #include "util.h"
> 
> -/* Lock order: qd_lock -> qd->lockref.lock -> lru lock */
> +#define GFS2_QD_HASH_SHIFT      12

Should this be a function of the number of CPUs?  (Might not be an issue
if the really big systems don't use GFS.)

> +#define GFS2_QD_HASH_SIZE       (1 << GFS2_QD_HASH_SHIFT)
> +#define GFS2_QD_HASH_MASK       (GFS2_QD_HASH_SIZE - 1)
> +
> +/* Lock order: qd_lock -> bucket lock -> qd->lockref.lock -> lru lock */
>  static DEFINE_SPINLOCK(qd_lock);
>  struct list_lru gfs2_qd_lru;
> 
> +static struct hlist_bl_head qd_hash_table[GFS2_QD_HASH_SIZE];
> +
> +static unsigned int gfs2_qd_hash(const struct gfs2_sbd *sdp,
> +				 const struct kqid qid)
> +{
> +	unsigned int h;
> +
> +	h = jhash(&sdp, sizeof(struct gfs2_sbd *), 0);
> +	h = jhash(&qid, sizeof(struct kqid), h);
> +
> +	return h & GFS2_QD_HASH_MASK;
> +}
> +
> +static inline void spin_lock_bucket(unsigned int hash)
> +{
> +        hlist_bl_lock(&qd_hash_table[hash]);
> +}
> +
> +static inline void spin_unlock_bucket(unsigned int hash)
> +{
> +        hlist_bl_unlock(&qd_hash_table[hash]);
> +}
> +
> +static void gfs2_qd_dealloc(struct rcu_head *rcu)
> +{
> +	struct gfs2_quota_data *qd = container_of(rcu, struct gfs2_quota_data, qd_rcu);
> +	kmem_cache_free(gfs2_quotad_cachep, qd);
> +}
> +
>  static void gfs2_qd_dispose(struct list_head *list)
>  {
>  	struct gfs2_quota_data *qd;
> @@ -87,6 +124,10 @@ static void gfs2_qd_dispose(struct list_head *list)
>  		list_del(&qd->qd_list);
>  		spin_unlock(&qd_lock);
> 
> +		spin_lock_bucket(qd->qd_hash);
> +		hlist_bl_del_rcu(&qd->qd_hlist);
> +		spin_unlock_bucket(qd->qd_hash);
> +

Good, removed from the RCU-traversed list before invoking call_rcu().

>  		gfs2_assert_warn(sdp, !qd->qd_change);
>  		gfs2_assert_warn(sdp, !qd->qd_slot_count);
>  		gfs2_assert_warn(sdp, !qd->qd_bh_count);
> @@ -95,7 +136,7 @@ static void gfs2_qd_dispose(struct list_head *list)
>  		atomic_dec(&sdp->sd_quota_count);
> 
>  		/* Delete it from the common reclaim list */
> -		kmem_cache_free(gfs2_quotad_cachep, qd);
> +		call_rcu(&qd->qd_rcu, gfs2_qd_dealloc);
>  	}
>  }
> 
> @@ -165,83 +206,95 @@ static u64 qd2offset(struct gfs2_quota_data *qd)
>  	return offset;
>  }
> 
> -static int qd_alloc(struct gfs2_sbd *sdp, struct kqid qid,
> -		    struct gfs2_quota_data **qdp)
> +static struct gfs2_quota_data *qd_alloc(unsigned hash, struct gfs2_sbd *sdp, struct kqid qid)
>  {
>  	struct gfs2_quota_data *qd;
>  	int error;
> 
>  	qd = kmem_cache_zalloc(gfs2_quotad_cachep, GFP_NOFS);
>  	if (!qd)
> -		return -ENOMEM;
> +		return NULL;
> 
> +	qd->qd_sbd = sdp;
>  	qd->qd_lockref.count = 1;
>  	spin_lock_init(&qd->qd_lockref.lock);
>  	qd->qd_id = qid;
>  	qd->qd_slot = -1;
>  	INIT_LIST_HEAD(&qd->qd_lru);
> +	qd->qd_hash = hash;
> 
>  	error = gfs2_glock_get(sdp, qd2index(qd),
>  			      &gfs2_quota_glops, CREATE, &qd->qd_gl);
>  	if (error)
>  		goto fail;
> 
> -	*qdp = qd;
> -
> -	return 0;
> +	return qd;
> 
>  fail:
>  	kmem_cache_free(gfs2_quotad_cachep, qd);
> -	return error;
> +	return NULL;
>  }
> 
> -static int qd_get(struct gfs2_sbd *sdp, struct kqid qid,
> -		  struct gfs2_quota_data **qdp)
> +static struct gfs2_quota_data *gfs2_qd_search_bucket(unsigned int hash,
> +						     const struct gfs2_sbd *sdp,
> +						     struct kqid qid)
>  {
> -	struct gfs2_quota_data *qd = NULL, *new_qd = NULL;
> -	int error, found;
> -
> -	*qdp = NULL;
> +	struct gfs2_quota_data *qd;
> +	struct hlist_bl_node *h;
> 
> -	for (;;) {
> -		found = 0;
> -		spin_lock(&qd_lock);
> -		list_for_each_entry(qd, &sdp->sd_quota_list, qd_list) {
> -			if (qid_eq(qd->qd_id, qid) &&
> -			    lockref_get_not_dead(&qd->qd_lockref)) {
> -				list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
> -				found = 1;
> -				break;
> -			}
> +	hlist_bl_for_each_entry_rcu(qd, h, &qd_hash_table[hash], qd_hlist) {

All callers of gfs2_qd_search_bucket() either hold the update-side lock
(acquired via spin_lock_bucket()) or have invoked rcu_read_lock(), so
this is good.

> +		if (!qid_eq(qd->qd_id, qid))
> +			continue;
> +		if (qd->qd_sbd != sdp)
> +			continue;
> +		if (lockref_get_not_dead(&qd->qd_lockref)) {
> +			list_lru_del(&gfs2_qd_lru, &qd->qd_lru);

list_lru_del() acquires a lock, but it is from an array whose size
depends on the NODES_SHIFT Kconfig variable.  The array size seems to
vary from 8 to 16 to 64 to 1024, depending on other configuration info,
so should be OK.

> +			return qd;
>  		}
> +	}
> 
> -		if (!found)
> -			qd = NULL;
> +	return NULL;
> +}
> 
> -		if (!qd && new_qd) {
> -			qd = new_qd;
> -			list_add(&qd->qd_list, &sdp->sd_quota_list);
> -			atomic_inc(&sdp->sd_quota_count);
> -			new_qd = NULL;
> -		}
> 
> -		spin_unlock(&qd_lock);
> +static int qd_get(struct gfs2_sbd *sdp, struct kqid qid,
> +		  struct gfs2_quota_data **qdp)
> +{
> +	struct gfs2_quota_data *qd, *new_qd;
> +	unsigned int hash = gfs2_qd_hash(sdp, qid);
> 
> -		if (qd) {
> -			if (new_qd) {
> -				gfs2_glock_put(new_qd->qd_gl);
> -				kmem_cache_free(gfs2_quotad_cachep, new_qd);
> -			}
> -			*qdp = qd;
> -			return 0;
> -		}
> +	rcu_read_lock();
> +	*qdp = qd = gfs2_qd_search_bucket(hash, sdp, qid);
> +	rcu_read_unlock();
> 
> -		error = qd_alloc(sdp, qid, &new_qd);
> -		if (error)
> -			return error;
> +	if (qd)
> +		return 0;
> +
> +	new_qd = qd_alloc(hash, sdp, qid);
> +	if (!new_qd)
> +		return -ENOMEM;
> +
> +	spin_lock(&qd_lock);
> +	spin_lock_bucket(hash);
> +	*qdp = qd = gfs2_qd_search_bucket(hash, sdp, qid);
> +	if (qd == NULL) {
> +		*qdp = new_qd;
> +		list_add(&new_qd->qd_list, &sdp->sd_quota_list);
> +		hlist_bl_add_head_rcu(&new_qd->qd_hlist, &qd_hash_table[hash]);
> +		atomic_inc(&sdp->sd_quota_count);
> +	}
> +	spin_unlock_bucket(hash);
> +	spin_unlock(&qd_lock);
> +
> +	if (qd) {
> +		gfs2_glock_put(new_qd->qd_gl);
> +		kmem_cache_free(gfs2_quotad_cachep, new_qd);
>  	}
> +
> +	return 0;
>  }
> 
> +
>  static void qd_hold(struct gfs2_quota_data *qd)
>  {
>  	struct gfs2_sbd *sdp = qd->qd_gl->gl_sbd;
> @@ -1215,6 +1268,7 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
>  	unsigned int blocks = size >> sdp->sd_sb.sb_bsize_shift;
>  	unsigned int x, slot = 0;
>  	unsigned int found = 0;
> +	unsigned int hash;
>  	u64 dblock;
>  	u32 extlen = 0;
>  	int error;
> @@ -1272,8 +1326,9 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
>  			if (!qc_change)
>  				continue;
> 
> -			error = qd_alloc(sdp, qc_id, &qd);
> -			if (error) {
> +			hash = gfs2_qd_hash(sdp, qc_id);
> +			qd = qd_alloc(hash, sdp, qc_id);
> +			if (qd == NULL) {
>  				brelse(bh);
>  				goto fail;
>  			}
> @@ -1289,6 +1344,10 @@ int gfs2_quota_init(struct gfs2_sbd *sdp)
>  			atomic_inc(&sdp->sd_quota_count);
>  			spin_unlock(&qd_lock);
> 
> +			spin_lock_bucket(hash);
> +			hlist_bl_add_head_rcu(&qd->qd_hlist, &qd_hash_table[hash]);
> +			spin_unlock_bucket(hash);
> +
>  			found++;
>  		}
> 
> @@ -1335,11 +1394,16 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
>  		spin_unlock(&qd->qd_lockref.lock);
> 
>  		list_del(&qd->qd_list);
> +
>  		/* Also remove if this qd exists in the reclaim list */
>  		list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
>  		atomic_dec(&sdp->sd_quota_count);
>  		spin_unlock(&qd_lock);
> 
> +		spin_lock_bucket(qd->qd_hash);
> +		hlist_bl_del_rcu(&qd->qd_hlist);

Might just be my unfamiliarity with this code, but it took me a bit
to see the difference between ->qd_hlist and ->qd_list.  Of course, until
I spotted the difference, I was wondering why you were removing the
item twice.  ;-)

> +		spin_unlock_bucket(qd->qd_hash);
> +
>  		if (!qd->qd_lockref.count) {
>  			gfs2_assert_warn(sdp, !qd->qd_change);
>  			gfs2_assert_warn(sdp, !qd->qd_slot_count);
> @@ -1348,7 +1412,7 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
>  		gfs2_assert_warn(sdp, !qd->qd_bh_count);
> 
>  		gfs2_glock_put(qd->qd_gl);
> -		kmem_cache_free(gfs2_quotad_cachep, qd);
> +		call_rcu(&qd->qd_rcu, gfs2_qd_dealloc);
> 
>  		spin_lock(&qd_lock);
>  	}
> @@ -1643,3 +1707,11 @@ const struct quotactl_ops gfs2_quotactl_ops = {
>  	.get_dqblk	= gfs2_get_dqblk,
>  	.set_dqblk	= gfs2_set_dqblk,
>  };
> +
> +void __init gfs2_quota_hash_init(void)
> +{
> +	unsigned i;
> +
> +	for(i = 0; i < GFS2_QD_HASH_SIZE; i++)
> +		INIT_HLIST_BL_HEAD(&qd_hash_table[i]);
> +}
> diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
> index 96e4f34..55d506e 100644
> --- a/fs/gfs2/quota.h
> +++ b/fs/gfs2/quota.h
> @@ -57,5 +57,6 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip)
>  extern const struct quotactl_ops gfs2_quotactl_ops;
>  extern struct shrinker gfs2_qd_shrinker;
>  extern struct list_lru gfs2_qd_lru;
> +extern void __init gfs2_quota_hash_init(void);
> 
>  #endif /* __QUOTA_DOT_H__ */
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
  2014-01-22  5:32   ` Paul E. McKenney
@ 2014-01-22  6:06     ` Sasha Levin
  2014-01-22  9:43       ` Steven Whitehouse
  2014-01-22  9:58     ` Steven Whitehouse
  1 sibling, 1 reply; 63+ messages in thread
From: Sasha Levin @ 2014-01-22  6:06 UTC (permalink / raw)
  To: paulmck, Steven Whitehouse; +Cc: linux-kernel, cluster-devel, Abhijith Das

On 01/22/2014 12:32 AM, Paul E. McKenney wrote:
> On Mon, Jan 20, 2014 at 12:23:40PM +0000, Steven Whitehouse wrote:
>> >Prior to this patch, GFS2 kept all the quotas for each
>> >super block in a single linked list. This is rather slow
>> >when there are large numbers of quotas.
>> >
>> >This patch introduces a hlist_bl based hash table, similar
>> >to the one used for glocks. The initial look up of the quota
>> >is now lockless in the case where it is already cached,
>> >although we still have to take the per quota spinlock in
>> >order to bump the ref count. Either way though, this is a
>> >big improvement on what was there before.
>> >
>> >The qd_lock and the per super block list is preserved, for
>> >the time being. However it is intended that since this is no
>> >longer used for its original role, it should be possible to
>> >shrink the number of items on that list in due course and
>> >remove the requirement to take qd_lock in qd_get.
>> >
>> >Signed-off-by: Steven Whitehouse<swhiteho@redhat.com>
>> >Cc: Abhijith Das<adas@redhat.com>
>> >Cc: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> Interesting!  I thought that Sasha Levin had a hash table in the works,
> but I don't see it, so CCing him.

Indeed, there is a hlist based hashtable at include/linux/hashtable.h for couple kernel
versions now. However, there's no hlist_bl one.

If there is a plan on adding a hlist_bl hashtable for whatever reason, it should probably
be done by expanding hashtable.h so that more places that use hlist_bl would benefit from it (yes,
there are couple more places that do hlist_bl hashtable).

Also, do we really want to use hlist_bl here? It doesn't seem like it's being done to conserve on
memory, and that's the only reason it should be used for. Doing a single spinlock per bucket is
much more efficient than using the bit locking scheme that hlist_bl does.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
  2014-01-22  6:06     ` Sasha Levin
@ 2014-01-22  9:43       ` Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-22  9:43 UTC (permalink / raw)
  To: Sasha Levin; +Cc: paulmck, linux-kernel, cluster-devel, Abhijith Das

Hi,

On Wed, 2014-01-22 at 01:06 -0500, Sasha Levin wrote:
> On 01/22/2014 12:32 AM, Paul E. McKenney wrote:
> > On Mon, Jan 20, 2014 at 12:23:40PM +0000, Steven Whitehouse wrote:
> >> >Prior to this patch, GFS2 kept all the quotas for each
> >> >super block in a single linked list. This is rather slow
> >> >when there are large numbers of quotas.
> >> >
> >> >This patch introduces a hlist_bl based hash table, similar
> >> >to the one used for glocks. The initial look up of the quota
> >> >is now lockless in the case where it is already cached,
> >> >although we still have to take the per quota spinlock in
> >> >order to bump the ref count. Either way though, this is a
> >> >big improvement on what was there before.
> >> >
> >> >The qd_lock and the per super block list is preserved, for
> >> >the time being. However it is intended that since this is no
> >> >longer used for its original role, it should be possible to
> >> >shrink the number of items on that list in due course and
> >> >remove the requirement to take qd_lock in qd_get.
> >> >
> >> >Signed-off-by: Steven Whitehouse<swhiteho@redhat.com>
> >> >Cc: Abhijith Das<adas@redhat.com>
> >> >Cc: Paul E. McKenney<paulmck@linux.vnet.ibm.com>
> > Interesting!  I thought that Sasha Levin had a hash table in the works,
> > but I don't see it, so CCing him.
> 
> Indeed, there is a hlist based hashtable at include/linux/hashtable.h for couple kernel
> versions now. However, there's no hlist_bl one.
> 
> If there is a plan on adding a hlist_bl hashtable for whatever reason, it should probably
> be done by expanding hashtable.h so that more places that use hlist_bl would benefit from it (yes,
> there are couple more places that do hlist_bl hashtable).
> 
> Also, do we really want to use hlist_bl here? It doesn't seem like it's being done to conserve on
> memory, and that's the only reason it should be used for. Doing a single spinlock per bucket is
> much more efficient than using the bit locking scheme that hlist_bl does.
> 
> 
> Thanks,
> Sasha

So this will probably make a bit more sense with a bit of history to
explain how we got here... The recent addition of the quota hash table
is modeled upon the one we are using for glocks at the moment. The glock
hash table at one stage, had one lock per bucket, and was then expanded
so that we had more buckets, but the same number of locks - to conserve
memory. It was then expanded again when we moved to hlist_bl a little
while ago. At each stage we gained performance so the changes seemed to
be a good thing.

Really the max number of glocks depends on the size of the memory, so
that we should really try to scale the hash table with increasing memory
size, however a simpler static table has worked reasonably well for now.
If we could use a tree (or forest) instead of a hash table that would be
even better, but still a bit of a pain to do with RCU I think, which is
why I've not gone that extra step yet.

Now the quota hash table has been modeled as a smaller version of the
glock hash table for the time being. The question is how large it should
be?... any more than a single hash list head is an improvement on the
previous code, and maybe the table should be larger than I've made it
currently. We'll see whether anybody runs into issues if they have large
number of quotas in due course.

So perhaps the memory saving is not the most important thing with the
quota hash table, but at least it matches in form the glock hash table
which has been proven over many releases as being effective. However, if
we can make use of some generic code that solves many of the problems
for us, then I can certainly look into that.

The changes in the quota code are not complete yet, and with a bit of
luck we'll have a further set of changes ready for the next merge window
too,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas
  2014-01-22  5:32   ` Paul E. McKenney
  2014-01-22  6:06     ` Sasha Levin
@ 2014-01-22  9:58     ` Steven Whitehouse
  1 sibling, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-01-22  9:58 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel, cluster-devel, Abhijith Das, sasha.levin

Hi,

On Tue, 2014-01-21 at 21:32 -0800, Paul E. McKenney wrote:
> On Mon, Jan 20, 2014 at 12:23:40PM +0000, Steven Whitehouse wrote:
> > Prior to this patch, GFS2 kept all the quotas for each
> > super block in a single linked list. This is rather slow
> > when there are large numbers of quotas.
> > 
> > This patch introduces a hlist_bl based hash table, similar
> > to the one used for glocks. The initial look up of the quota
> > is now lockless in the case where it is already cached,
> > although we still have to take the per quota spinlock in
> > order to bump the ref count. Either way though, this is a
> > big improvement on what was there before.
> > 
> > The qd_lock and the per super block list is preserved, for
> > the time being. However it is intended that since this is no
> > longer used for its original role, it should be possible to
> > shrink the number of items on that list in due course and
> > remove the requirement to take qd_lock in qd_get.
> > 
> > Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
> > Cc: Abhijith Das <adas@redhat.com>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Interesting!  I thought that Sasha Levin had a hash table in the works,
> but I don't see it, so CCing him.
> 
> A few questions and comments below.
> 
> 							Thanx, Paul
> 
Thanks for the review.

[snip]
> > +#define GFS2_QD_HASH_SHIFT      12

> Should this be a function of the number of CPUs?  (Might not be an issue
> if the really big systems don't use GFS.)

I'm not sure... really it depends on how many quotas are in use, so
number of users, and even on relatively small systems, there might be a
lot of them. So I'm guessing a bit, and we'll bump it up a bit if its a
problem. There is a lot of extra complexity in changing hash table sizes
on the fly, which would be another possible solution. Either way it is a
vast improvement on what was there before :-)

[snip] 
> > +             if (!qid_eq(qd->qd_id, qid))
> > +                     continue;
> > +             if (qd->qd_sbd != sdp)
> > +                     continue;
> > +             if (lockref_get_not_dead(&qd->qd_lockref)) {
> > +                     list_lru_del(&gfs2_qd_lru, &qd->qd_lru);

> list_lru_del() acquires a lock, but it is from an array whose size
> depends on the NODES_SHIFT Kconfig variable.  The array size seems to
> vary from 8 to 16 to 64 to 1024, depending on other configuration info,
> so should be OK.

Yes, we've tried to make use of the generic lru code here, and I'd like
to do that same for the glocks, however thats more complicated, so we've
not got that far just yet, even though we have made some steps in that
direction. Dave Chinner has pointed out to us that the lru code was
designed such that the lru lock should always be the inner most lock, so
thats the ordering that we've used here.

[snip]
> > @@ -1335,11 +1394,16 @@ void gfs2_quota_cleanup(struct gfs2_sbd *sdp)
> >  		spin_unlock(&qd->qd_lockref.lock);
> > 
> >  		list_del(&qd->qd_list);
> > +
> >  		/* Also remove if this qd exists in the reclaim list */
> >  		list_lru_del(&gfs2_qd_lru, &qd->qd_lru);
> >  		atomic_dec(&sdp->sd_quota_count);
> >  		spin_unlock(&qd_lock);
> > 
> > +		spin_lock_bucket(qd->qd_hash);
> > +		hlist_bl_del_rcu(&qd->qd_hlist);
> 
> Might just be my unfamiliarity with this code, but it took me a bit
> to see the difference between ->qd_hlist and ->qd_list.  Of course, until
> I spotted the difference, I was wondering why you were removing the
> item twice.  ;-)
> 
Well I hope that eventually qd_list might be able to go away. I'm still
working on a plan to deal with improving the quota data writeback which
should help to make that happen,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2014-10-08  9:53 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-10-08  9:53 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Not a huge amount this time... just four patches. This time we have a couple
of bug fixes, one relating to bad i_goal values which are now ignored (i_goal
is basically a hint so it is safe to so this) and another relating to the
saving of the dirent location during rename. There is one performance
improvement, which is an optimisation in rgblk_free so that multiple block
deallocations will now be more efficient, and one clean up patch to use
_RET_IP_ rather than writing it out longhand,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2014-06-03 11:02 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-06-03 11:02 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

This must be about the smallest merge window patch set ever for GFS2.
It is probably also the first one without a single patch from me. That
is down to a combination of factors, and I have some things in the works
that are not quite ready yet, that I hope to put in next time around.

Returning to what is here this time... we have 3 patches which fix
various warnings. Two are bug fixes (for quotas and also a
rare recovery race condition). The final patch, from Ben Marzinski,
is an important change in the freeze code which has been in
progress for some time. This removes the need to take and drop the
transaction lock for every single transaction, when the only time it
was used, was at file system freeze time. Ben's patch integrates the
freeze operation into the journal flush code as an alternative with
lower overheads and also lands up resolving some difficult to fix races
at the same time,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2014-04-01  9:15 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2014-04-01  9:15 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Here is the current content of the GFS2 -nmw tree for the
current merge window.

One of the main highlights this time, is not the patches themselves
but instead the widening contributor base. It is good to see that
interest is increasing in GFS2, and I'd like to thank all the
contributors to this patch set.

In addition to the usual set of bug fixes and clean ups, there are
patches to improve inode creation performance when xattrs are required
and some improvements to the transaction code which is intended to help
improve scalability after further changes in due course. Journal extent
mapping is also updated to make it more efficient and again, this is a
foundation for future work in this area.

The maximum number of ACLs has been increased to 300 (for a 4k block size)
which means that even with a few additional xattrs from selinux,
everything should fit within a single fs block. There is also a patch
to bring GFS2's own copy of the writepages code up to the same level as
the core VFS. Eventually we may be able to merge some of this code, since
it is fairly similar.

The other major change this time, is bringing consistency to the printing
of messages via fs_<level>, pr_<level> macros. 

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2013-11-04 11:09 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2013-11-04 11:09 UTC (permalink / raw)
  To: cluster-devel, linux-kernel

Hi,

I'm just back from firstly Edinburgh, and secondly holiday, and the
merge window is again upon us. I've added in the three pending patches
which were under test while I was away and then that should be it for
this time.

The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list
code. There is still plenty of scope for some further changes in
due course - faster lookups of quota structures is very much
on the todo list. Also, a start has been made towards the more tricky
issue of using the generic lru code with glocks, but that will
have to be completed in a subsequent merge window.

The other, more minor feature, is that there have been a number of
performance patches which relate to block allocation. In particular
they will improve performance when the disk is nearly full,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2 Pre-pull patch posting (merge window)
@ 2013-09-05  9:02 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2013-09-05  9:02 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

This is the smallest merge window patch set for GFS2 for quite
some time. Only one of the patches (moving gfs2_sync_meta) is
a non-bug fix patch, although the merge ordered and writeback
writepage patch is also a nice clean up.

A couple of the patches are quite recently added, due to my only
having recently returned from holiday, so I'll give them a couple
of extra days in -next before sending the pull request.

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2013-07-01  9:33 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2013-07-01  9:33 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

There are a few bug fixes for various, mostly very minor corner
cases, plus some interesting new features. The new features
include atomic_open whose main benefit will be the reduction in
locking overhead in case of combined lookup/create and open operations,
sorting the log buffer lists by block number to improve the efficiency
of AIL writeback, and agressively issuing revokes in gfs2_log_flush
to reduce overhead when dropping glocks,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2013-04-26  9:18 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2013-04-26  9:18 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Since the merge window is coming up soon, I'm posting the content of
the GFS2 -nmw tree as usual. There is not a whole lot of change this
time - there are some further changes which are in the works, but those
will be held over until next time.

Here there are some clean ups to inode creation, the addition of an
origin (local or remote) indicator to glock demote requests, removal
of one of the remaining GFP_NOFAIL allocations during log flushes,
one minor clean up, and a one liner bug fix,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2013-02-19 10:07 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2013-02-19 10:07 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

This is one of the smallest collections of patches for the merge
window for some time. There are some clean ups relating to the
transaction code and the shrinker, which are mostly in preparation
for further development, but also make the code much easier to
follow in these areas.

There is a patch which allows the use of ->writepages even in the
default ordered write mode for all writebacks. This results in
sending larger i/os to the block layer, and a subsequent increase
in performance. It also reduces the number of different i/o paths
by one.

There is also a bug fix reinstating the withdraw ack system which
somehow got lost when the lock modules were merged into GFS2.

And thats all this time around,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-11-30  9:52 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-11-30  9:52 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

So yes, this is a bit early, but the tree seems to have settled down
now, and I'd like to hold off any further feature patches until the
subsequent merge window at this stage.

The main feature this time is the new Orlov allocator and the patches
leading up to it which allow us to allocate new inodes from their own
allocation context, rather than borrowing that of their parent directory.
It is this change which then allows us to choose a different location
for subdirectories when required. This works exactly as per the ext3
implementation from the users point of view.

In addition to that, we've got a speed up in gfs2_rbm_from_block()
from Bob Peterson, three locking related improvements from Dave
Teigland plus a selection of smaller bug fixes and clean ups.

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-09-26  8:25 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-09-26  8:25 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

We've collected up a goodly number of patches in the -nmw tree now
and we can hold off any further changes until the following merge
window, so here is the current tree content.

The major feature this time is the "rbm" conversion in the resource
group code. The new struct gfs2_rbm specifies the location of an
allocatable block in (resource group, bitmap, offset) form. There
are a number of added helper functions, and later patches then
rewrite some of the resource group code in terms of this new
structure. Not only does this give us a nice code clean up, but
it also removes some of the previous restructions where extents
could not cross bitmap boundaries, for example.

In addition to that, there are a few bug fixes and clean ups, but
the rbm work is by far the majority of this patch set in terms of
number of changed lines.

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-07-23  8:00 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-07-23  8:00 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

As usual, here is the content of the GFS2 tree prior to sending
a merge request. Not a huge number of patches this time, but some
interesting features nonetheless.

A number of the earlier patches are aimed at cleaning up the resource
group code for the later patch which implements block reservations.
In addition to that, there are a few patches aimed at improving
the time taken to dump (the potentially rather large) glock debugfs
file. Beyond that there are a couple of bug fixes and thats about it
this time,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-05-17 12:23 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-05-17 12:23 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Since the merge window appears to be fast approaching, here are the
current GFS2 patches. This time there are two main themes, one is
updates to the log code, mostly on the writing side. The other is
preparation for some block reservation work which will probably
land in the subsequent merge window.

There is of course the usual collection of cleanup and bug fixes
as well. See the individual patches for the detailed descriptions,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-04-02 15:35                   ` Randy Dunlap
@ 2012-04-02 15:47                     ` Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-04-02 15:47 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Benjamin Poirier, David Teigland, linux-kernel, cluster-devel

Hi,

On Mon, 2012-04-02 at 08:35 -0700, Randy Dunlap wrote:
> On 03/26/2012 03:44 AM, Steven Whitehouse wrote:
> 
> > Hi,
> > 
> > On Fri, 2012-03-23 at 18:06 -0400, Benjamin Poirier wrote:
> > [snip]
> >>
> >> Instead of trying to select everything in GFS2, how about doing it this way?
> >>
> >> [PATCH] gfs2: use depends instead of select in kconfig
> >>
> >> Avoids having to duplicate the dependencies of what is 'select'ed (and on
> >> down...)
> >>
> >> Those dependencies are currently incomplete, leading to broken builds with
> >> GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.
> >>
> >> Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
> >> ---
> >>  fs/gfs2/Kconfig |    7 ++-----
> >>  1 files changed, 2 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> >> index c465ae0..eb08c9e 100644
> >> --- a/fs/gfs2/Kconfig
> >> +++ b/fs/gfs2/Kconfig
> >> @@ -1,10 +1,6 @@
> >>  config GFS2_FS
> >>  	tristate "GFS2 file system support"
> >>  	depends on (64BIT || LBDAF)
> >> -	select DLM if GFS2_FS_LOCKING_DLM
> >> -	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> >> -	select SYSFS if GFS2_FS_LOCKING_DLM
> >> -	select IP_SCTP if DLM_SCTP
> >>  	select FS_POSIX_ACL
> >>  	select CRC32
> >>  	select QUOTACTL
> >> @@ -29,7 +25,8 @@ config GFS2_FS
> >>  
> >>  config GFS2_FS_LOCKING_DLM
> >>  	bool "GFS2 DLM locking"
> >> -	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && HOTPLUG
> >> +	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && \
> >> +		HOTPLUG && DLM && CONFIGFS_FS && SYSFS
> >>  	help
> >>  	  Multiple node locking module for GFS2
> >>  
> > 
> > That looks ok to me. I've put it in the GFS2 -fixes tree, and if
> > everybody is happy with that I'll send a pull request shortly,
> 
> 
> Can we get Benjamin's patch merged, please?
> linux-next is still having build errors without it.
> 

It is in the GFS2 -nmw tree now, so it will be in linux-next shortly.
I'll merge up the -fixes tree shortly, but I'm expecting one more patch
for that very shortly,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-26 10:44                 ` Steven Whitehouse
@ 2012-04-02 15:35                   ` Randy Dunlap
  2012-04-02 15:47                     ` Steven Whitehouse
  0 siblings, 1 reply; 63+ messages in thread
From: Randy Dunlap @ 2012-04-02 15:35 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Benjamin Poirier, David Teigland, linux-kernel, cluster-devel

On 03/26/2012 03:44 AM, Steven Whitehouse wrote:

> Hi,
> 
> On Fri, 2012-03-23 at 18:06 -0400, Benjamin Poirier wrote:
> [snip]
>>
>> Instead of trying to select everything in GFS2, how about doing it this way?
>>
>> [PATCH] gfs2: use depends instead of select in kconfig
>>
>> Avoids having to duplicate the dependencies of what is 'select'ed (and on
>> down...)
>>
>> Those dependencies are currently incomplete, leading to broken builds with
>> GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.
>>
>> Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
>> ---
>>  fs/gfs2/Kconfig |    7 ++-----
>>  1 files changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
>> index c465ae0..eb08c9e 100644
>> --- a/fs/gfs2/Kconfig
>> +++ b/fs/gfs2/Kconfig
>> @@ -1,10 +1,6 @@
>>  config GFS2_FS
>>  	tristate "GFS2 file system support"
>>  	depends on (64BIT || LBDAF)
>> -	select DLM if GFS2_FS_LOCKING_DLM
>> -	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
>> -	select SYSFS if GFS2_FS_LOCKING_DLM
>> -	select IP_SCTP if DLM_SCTP
>>  	select FS_POSIX_ACL
>>  	select CRC32
>>  	select QUOTACTL
>> @@ -29,7 +25,8 @@ config GFS2_FS
>>  
>>  config GFS2_FS_LOCKING_DLM
>>  	bool "GFS2 DLM locking"
>> -	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && HOTPLUG
>> +	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && \
>> +		HOTPLUG && DLM && CONFIGFS_FS && SYSFS
>>  	help
>>  	  Multiple node locking module for GFS2
>>  
> 
> That looks ok to me. I've put it in the GFS2 -fixes tree, and if
> everybody is happy with that I'll send a pull request shortly,


Can we get Benjamin's patch merged, please?
linux-next is still having build errors without it.

-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 22:06               ` Benjamin Poirier
  2012-03-23 22:48                 ` Randy Dunlap
@ 2012-03-26 10:44                 ` Steven Whitehouse
  2012-04-02 15:35                   ` Randy Dunlap
  1 sibling, 1 reply; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-26 10:44 UTC (permalink / raw)
  To: Benjamin Poirier
  Cc: David Teigland, Randy Dunlap, linux-kernel, cluster-devel

Hi,

On Fri, 2012-03-23 at 18:06 -0400, Benjamin Poirier wrote:
[snip]
> 
> Instead of trying to select everything in GFS2, how about doing it this way?
> 
> [PATCH] gfs2: use depends instead of select in kconfig
> 
> Avoids having to duplicate the dependencies of what is 'select'ed (and on
> down...)
> 
> Those dependencies are currently incomplete, leading to broken builds with
> GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.
> 
> Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
> ---
>  fs/gfs2/Kconfig |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> index c465ae0..eb08c9e 100644
> --- a/fs/gfs2/Kconfig
> +++ b/fs/gfs2/Kconfig
> @@ -1,10 +1,6 @@
>  config GFS2_FS
>  	tristate "GFS2 file system support"
>  	depends on (64BIT || LBDAF)
> -	select DLM if GFS2_FS_LOCKING_DLM
> -	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> -	select SYSFS if GFS2_FS_LOCKING_DLM
> -	select IP_SCTP if DLM_SCTP
>  	select FS_POSIX_ACL
>  	select CRC32
>  	select QUOTACTL
> @@ -29,7 +25,8 @@ config GFS2_FS
>  
>  config GFS2_FS_LOCKING_DLM
>  	bool "GFS2 DLM locking"
> -	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && HOTPLUG
> +	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && \
> +		HOTPLUG && DLM && CONFIGFS_FS && SYSFS
>  	help
>  	  Multiple node locking module for GFS2
>  

That looks ok to me. I've put it in the GFS2 -fixes tree, and if
everybody is happy with that I'll send a pull request shortly,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 22:06               ` Benjamin Poirier
@ 2012-03-23 22:48                 ` Randy Dunlap
  2012-03-26 10:44                 ` Steven Whitehouse
  1 sibling, 0 replies; 63+ messages in thread
From: Randy Dunlap @ 2012-03-23 22:48 UTC (permalink / raw)
  To: Benjamin Poirier
  Cc: David Teigland, Steven Whitehouse, linux-kernel, cluster-devel

On 03/23/2012 03:06 PM, Benjamin Poirier wrote:

> On 2012/03/23 16:18, David Teigland wrote:
>> On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote:
>>>>> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
>>>>> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
>>>>> used anywhere else in the kernel tree AFAICT.
>>>>> DLM just always selects IP_SCTP.
>>>>
>>>> Here's what we have now:
>>>>
>>>> config GFS2_FS
>>>>         tristate "GFS2 file system support"
>>>>         depends on (64BIT || LBDAF)
>>>>         select DLM if GFS2_FS_LOCKING_DLM
>>>>         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
>>>>         select SYSFS if GFS2_FS_LOCKING_DLM
>>>>         select IP_SCTP if DLM_SCTP
>>>>         select FS_POSIX_ACL
>>>>         select CRC32
>>>>         select QUOTACTL
>>>>
>>>> menuconfig DLM
>>>>         tristate "Distributed Lock Manager (DLM)"
>>>>         depends on EXPERIMENTAL && INET
>>>>         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
>>>>         select IP_SCTP
>>>>
>>>> Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
>>>> just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
>>>> vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
>>>> possibly be undefined if we're selecting SCTP.
>>>
>>> What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
>>> don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
>>> in GFS2 is meaningless since there is no DLM_SCTP.
>>
>> https://lkml.org/lkml/2012/3/8/222 seems to have caused this by adding
>> the new dependency on the sctp module without any Kconfig changes.
>>
>> Should that patch have added depends IP_SCTP to the dlm and gfs2?
>>
> 
> Instead of trying to select everything in GFS2, how about doing it this way?
> 
> [PATCH] gfs2: use depends instead of select in kconfig
> 
> Avoids having to duplicate the dependencies of what is 'select'ed (and on
> down...)
> 
> Those dependencies are currently incomplete, leading to broken builds with
> GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.
> 
> Signed-off-by: Benjamin Poirier <bpoirier@suse.de>


That seems to work for me.  Thanks.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>


> ---
>  fs/gfs2/Kconfig |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> index c465ae0..eb08c9e 100644
> --- a/fs/gfs2/Kconfig
> +++ b/fs/gfs2/Kconfig
> @@ -1,10 +1,6 @@
>  config GFS2_FS
>  	tristate "GFS2 file system support"
>  	depends on (64BIT || LBDAF)
> -	select DLM if GFS2_FS_LOCKING_DLM
> -	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> -	select SYSFS if GFS2_FS_LOCKING_DLM
> -	select IP_SCTP if DLM_SCTP
>  	select FS_POSIX_ACL
>  	select CRC32
>  	select QUOTACTL
> @@ -29,7 +25,8 @@ config GFS2_FS
>  
>  config GFS2_FS_LOCKING_DLM
>  	bool "GFS2 DLM locking"
> -	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && HOTPLUG
> +	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && \
> +		HOTPLUG && DLM && CONFIGFS_FS && SYSFS
>  	help
>  	  Multiple node locking module for GFS2
>  



-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 20:18             ` David Teigland
  2012-03-23 22:06               ` Benjamin Poirier
@ 2012-03-23 22:06               ` Randy Dunlap
  1 sibling, 0 replies; 63+ messages in thread
From: Randy Dunlap @ 2012-03-23 22:06 UTC (permalink / raw)
  To: David Teigland; +Cc: Steven Whitehouse, linux-kernel, cluster-devel, bpoirier

On 03/23/2012 01:18 PM, David Teigland wrote:

> On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote:
>>>> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
>>>> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
>>>> used anywhere else in the kernel tree AFAICT.
>>>> DLM just always selects IP_SCTP.
>>>
>>> Here's what we have now:
>>>
>>> config GFS2_FS
>>>         tristate "GFS2 file system support"
>>>         depends on (64BIT || LBDAF)
>>>         select DLM if GFS2_FS_LOCKING_DLM
>>>         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
>>>         select SYSFS if GFS2_FS_LOCKING_DLM
>>>         select IP_SCTP if DLM_SCTP
>>>         select FS_POSIX_ACL
>>>         select CRC32
>>>         select QUOTACTL
>>>
>>> menuconfig DLM
>>>         tristate "Distributed Lock Manager (DLM)"
>>>         depends on EXPERIMENTAL && INET
>>>         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
>>>         select IP_SCTP
>>>
>>> Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
>>> just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
>>> vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
>>> possibly be undefined if we're selecting SCTP.
>>
>> What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
>> don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
>> in GFS2 is meaningless since there is no DLM_SCTP.
> 
> https://lkml.org/lkml/2012/3/8/222 seems to have caused this by adding
> the new dependency on the sctp module without any Kconfig changes.

bad URL?  I don't see how that patch affects this area at all.

> Should that patch have added depends IP_SCTP to the dlm and gfs2?

Sounds reasonable (but I haven't seen the patch).


-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 20:18             ` David Teigland
@ 2012-03-23 22:06               ` Benjamin Poirier
  2012-03-23 22:48                 ` Randy Dunlap
  2012-03-26 10:44                 ` Steven Whitehouse
  2012-03-23 22:06               ` Randy Dunlap
  1 sibling, 2 replies; 63+ messages in thread
From: Benjamin Poirier @ 2012-03-23 22:06 UTC (permalink / raw)
  To: David Teigland
  Cc: Randy Dunlap, Steven Whitehouse, linux-kernel, cluster-devel

On 2012/03/23 16:18, David Teigland wrote:
> On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote:
> > >> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> > >> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> > >> used anywhere else in the kernel tree AFAICT.
> > >> DLM just always selects IP_SCTP.
> > > 
> > > Here's what we have now:
> > > 
> > > config GFS2_FS
> > >         tristate "GFS2 file system support"
> > >         depends on (64BIT || LBDAF)
> > >         select DLM if GFS2_FS_LOCKING_DLM
> > >         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> > >         select SYSFS if GFS2_FS_LOCKING_DLM
> > >         select IP_SCTP if DLM_SCTP
> > >         select FS_POSIX_ACL
> > >         select CRC32
> > >         select QUOTACTL
> > > 
> > > menuconfig DLM
> > >         tristate "Distributed Lock Manager (DLM)"
> > >         depends on EXPERIMENTAL && INET
> > >         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
> > >         select IP_SCTP
> > > 
> > > Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
> > > just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
> > > vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
> > > possibly be undefined if we're selecting SCTP.
> > 
> > What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
> > don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
> > in GFS2 is meaningless since there is no DLM_SCTP.
> 
> https://lkml.org/lkml/2012/3/8/222 seems to have caused this by adding
> the new dependency on the sctp module without any Kconfig changes.
> 
> Should that patch have added depends IP_SCTP to the dlm and gfs2?
> 

Instead of trying to select everything in GFS2, how about doing it this way?

[PATCH] gfs2: use depends instead of select in kconfig

Avoids having to duplicate the dependencies of what is 'select'ed (and on
down...)

Those dependencies are currently incomplete, leading to broken builds with
GFS2_FS_LOCKING_DLM=y and IP_SCTP=n.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---
 fs/gfs2/Kconfig |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index c465ae0..eb08c9e 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -1,10 +1,6 @@
 config GFS2_FS
 	tristate "GFS2 file system support"
 	depends on (64BIT || LBDAF)
-	select DLM if GFS2_FS_LOCKING_DLM
-	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
-	select SYSFS if GFS2_FS_LOCKING_DLM
-	select IP_SCTP if DLM_SCTP
 	select FS_POSIX_ACL
 	select CRC32
 	select QUOTACTL
@@ -29,7 +25,8 @@ config GFS2_FS
 
 config GFS2_FS_LOCKING_DLM
 	bool "GFS2 DLM locking"
-	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && HOTPLUG
+	depends on (GFS2_FS!=n) && NET && INET && (IPV6 || IPV6=n) && \
+		HOTPLUG && DLM && CONFIGFS_FS && SYSFS
 	help
 	  Multiple node locking module for GFS2
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 20:06           ` Randy Dunlap
  2012-03-23 20:09             ` Steven Whitehouse
@ 2012-03-23 20:18             ` David Teigland
  2012-03-23 22:06               ` Benjamin Poirier
  2012-03-23 22:06               ` Randy Dunlap
  1 sibling, 2 replies; 63+ messages in thread
From: David Teigland @ 2012-03-23 20:18 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Steven Whitehouse, linux-kernel, cluster-devel, bpoirier

On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote:
> >> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> >> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> >> used anywhere else in the kernel tree AFAICT.
> >> DLM just always selects IP_SCTP.
> > 
> > Here's what we have now:
> > 
> > config GFS2_FS
> >         tristate "GFS2 file system support"
> >         depends on (64BIT || LBDAF)
> >         select DLM if GFS2_FS_LOCKING_DLM
> >         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> >         select SYSFS if GFS2_FS_LOCKING_DLM
> >         select IP_SCTP if DLM_SCTP
> >         select FS_POSIX_ACL
> >         select CRC32
> >         select QUOTACTL
> > 
> > menuconfig DLM
> >         tristate "Distributed Lock Manager (DLM)"
> >         depends on EXPERIMENTAL && INET
> >         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
> >         select IP_SCTP
> > 
> > Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
> > just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
> > vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
> > possibly be undefined if we're selecting SCTP.
> 
> What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
> don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
> in GFS2 is meaningless since there is no DLM_SCTP.

https://lkml.org/lkml/2012/3/8/222 seems to have caused this by adding
the new dependency on the sctp module without any Kconfig changes.

Should that patch have added depends IP_SCTP to the dlm and gfs2?


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 20:06           ` Randy Dunlap
@ 2012-03-23 20:09             ` Steven Whitehouse
  2012-03-23 20:18             ` David Teigland
  1 sibling, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-23 20:09 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: David Teigland, linux-kernel, cluster-devel

Hi,

On Fri, 2012-03-23 at 13:06 -0700, Randy Dunlap wrote:
> On 03/23/2012 12:41 PM, David Teigland wrote:
> 
> > 
> >> on i386:
> >>
> >> ERROR: "sctp_do_peeloff" [fs/dlm/dlm.ko] undefined!
> >>
> >>
> >> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> >> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> >> used anywhere else in the kernel tree AFAICT.
> >> DLM just always selects IP_SCTP.
> > 
> > Here's what we have now:
> > 
> > config GFS2_FS
> >         tristate "GFS2 file system support"
> >         depends on (64BIT || LBDAF)
> >         select DLM if GFS2_FS_LOCKING_DLM
> >         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> >         select SYSFS if GFS2_FS_LOCKING_DLM
> >         select IP_SCTP if DLM_SCTP
> >         select FS_POSIX_ACL
> >         select CRC32
> >         select QUOTACTL
> > 
> > menuconfig DLM
> >         tristate "Distributed Lock Manager (DLM)"
> >         depends on EXPERIMENTAL && INET
> >         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
> >         select IP_SCTP
> > 
> > Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
> > just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
> > vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
> > possibly be undefined if we're selecting SCTP.
> 
> What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
> don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
> in GFS2 is meaningless since there is no DLM_SCTP.
> 
> I just verified that the (posted) failing config still fails with
> today's linux-next.
> 

The DLM_SCTP is historical. There used to be such a thing, but that
config option went away, and there is now run time selection of the DLM
transport. So that the GFS2 Kconfig should have been updated, however
that appears not to be enough on its own to resolve the issue,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 19:41         ` David Teigland
  2012-03-23 19:46           ` David Miller
@ 2012-03-23 20:06           ` Randy Dunlap
  2012-03-23 20:09             ` Steven Whitehouse
  2012-03-23 20:18             ` David Teigland
  1 sibling, 2 replies; 63+ messages in thread
From: Randy Dunlap @ 2012-03-23 20:06 UTC (permalink / raw)
  To: David Teigland; +Cc: Steven Whitehouse, linux-kernel, cluster-devel

On 03/23/2012 12:41 PM, David Teigland wrote:

> 
>> on i386:
>>
>> ERROR: "sctp_do_peeloff" [fs/dlm/dlm.ko] undefined!
>>
>>
>> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
>> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
>> used anywhere else in the kernel tree AFAICT.
>> DLM just always selects IP_SCTP.
> 
> Here's what we have now:
> 
> config GFS2_FS
>         tristate "GFS2 file system support"
>         depends on (64BIT || LBDAF)
>         select DLM if GFS2_FS_LOCKING_DLM
>         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
>         select SYSFS if GFS2_FS_LOCKING_DLM
>         select IP_SCTP if DLM_SCTP
>         select FS_POSIX_ACL
>         select CRC32
>         select QUOTACTL
> 
> menuconfig DLM
>         tristate "Distributed Lock Manager (DLM)"
>         depends on EXPERIMENTAL && INET
>         depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
>         select IP_SCTP
> 
> Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
> just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
> vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
> possibly be undefined if we're selecting SCTP.

What is selecting SCTP?  DLM?  so GFS2 selects DLM, but selects
don't follow dependency chains.  Also, the "select IP_SCTP if DLM_SCTP"
in GFS2 is meaningless since there is no DLM_SCTP.

I just verified that the (posted) failing config still fails with
today's linux-next.

-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-23 19:41         ` David Teigland
@ 2012-03-23 19:46           ` David Miller
  2012-03-23 20:06           ` Randy Dunlap
  1 sibling, 0 replies; 63+ messages in thread
From: David Miller @ 2012-03-23 19:46 UTC (permalink / raw)
  To: teigland; +Cc: swhiteho, rdunlap, linux-kernel, cluster-devel

From: David Teigland <teigland@redhat.com>
Date: Fri, 23 Mar 2012 15:41:52 -0400

> Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
> just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
> vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
> possibly be undefined if we're selecting SCTP.

GFS2=y SCTP=m

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 15:34       ` Steven Whitehouse
@ 2012-03-23 19:41         ` David Teigland
  2012-03-23 19:46           ` David Miller
  2012-03-23 20:06           ` Randy Dunlap
  0 siblings, 2 replies; 63+ messages in thread
From: David Teigland @ 2012-03-23 19:41 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Randy Dunlap, linux-kernel, cluster-devel


> on i386:
>
> ERROR: "sctp_do_peeloff" [fs/dlm/dlm.ko] undefined!
>
>
> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> used anywhere else in the kernel tree AFAICT.
> DLM just always selects IP_SCTP.

Here's what we have now:

config GFS2_FS
        tristate "GFS2 file system support"
        depends on (64BIT || LBDAF)
        select DLM if GFS2_FS_LOCKING_DLM
        select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
        select SYSFS if GFS2_FS_LOCKING_DLM
        select IP_SCTP if DLM_SCTP
        select FS_POSIX_ACL
        select CRC32
        select QUOTACTL

menuconfig DLM
        tristate "Distributed Lock Manager (DLM)"
        depends on EXPERIMENTAL && INET
        depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
        select IP_SCTP

Why does gfs2 Kconfig bother with SCTP at all?  It seems that line should
just be removed.  I'll also remove EXPERIMENTAL.  I don't understand the
vagaries of Kconfig, so a dumb question, how could sctp_do_peeloff
possibly be undefined if we're selecting SCTP.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 15:18     ` Randy Dunlap
  2012-03-19 15:34       ` Steven Whitehouse
@ 2012-03-20  9:47       ` Steven Whitehouse
  1 sibling, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-20  9:47 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, cluster-devel, teigland

Hi,

On Mon, 2012-03-19 at 08:18 -0700, Randy Dunlap wrote:
> On 03/19/2012 07:59 AM, Steven Whitehouse wrote:
> 
> > Hi,
> > 
> > On Mon, 2012-03-19 at 07:45 -0700, Randy Dunlap wrote:
> >> On 03/19/2012 03:25 AM, Steven Whitehouse wrote:
> >>
> >>> Hi,
> >>>
> >>> Not a huge number of patches this time. Some notable new features
> >>> though:
> >>>  - Glock stats gathering (v. useful for performance analysis)
> >>>  - FITRIM ioctl support
> >>>  - Sorting the ordered write list (big performance increase when the workload
> >>>    doesn't result in the write requests being nicely ordered to start with)
> >>>
> >>> Plus a few clean ups, and bug fixes in addition,
> >>
> >>
> >>
> >> Hi,
> >>
> >> I reported a build error in linux-next 20120313, but it appears
> >> that mainline also needs the fix (when it's ready) since mainline
> >> gfs2 Kconfig selects DLM_SCTP, which does not exist.
> >>
> >> https://lkml.org/lkml/2012/3/13/456
> >>
> > 
> > Does the following fix the problem? If so then I'll roll that into the
> > tree before it gets pushed,
> > 
> 
> No, that's not sufficient:
> 
> warning: (GFS2_FS) selects DLM which has unmet direct dependencies (EXPERIMENTAL && INET && SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n))
> warning: (DLM && GFS2_FS) selects IP_SCTP which has unmet direct dependencies (NET && INET && EXPERIMENTAL && (IPV6 || IPV6=n))
> 
> and
> 
> ERROR: "crc32c" [net/sctp/sctp.ko] undefined!
> 
> 
Since the pending patch set doesn't affect the Kconfig at all, I don't
think that this issue needs to hold up merging the GFS2 tree. We'll
follow up with a fix for this later on,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 15:18     ` Randy Dunlap
@ 2012-03-19 15:34       ` Steven Whitehouse
  2012-03-23 19:41         ` David Teigland
  2012-03-20  9:47       ` Steven Whitehouse
  1 sibling, 1 reply; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-19 15:34 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, cluster-devel, teigland

Hi,

On Mon, 2012-03-19 at 08:18 -0700, Randy Dunlap wrote:
> On 03/19/2012 07:59 AM, Steven Whitehouse wrote:
> 
> > Hi,
> > 
> > On Mon, 2012-03-19 at 07:45 -0700, Randy Dunlap wrote:
> >> On 03/19/2012 03:25 AM, Steven Whitehouse wrote:
> >>
> >>> Hi,
> >>>
> >>> Not a huge number of patches this time. Some notable new features
> >>> though:
> >>>  - Glock stats gathering (v. useful for performance analysis)
> >>>  - FITRIM ioctl support
> >>>  - Sorting the ordered write list (big performance increase when the workload
> >>>    doesn't result in the write requests being nicely ordered to start with)
> >>>
> >>> Plus a few clean ups, and bug fixes in addition,
> >>
> >>
> >>
> >> Hi,
> >>
> >> I reported a build error in linux-next 20120313, but it appears
> >> that mainline also needs the fix (when it's ready) since mainline
> >> gfs2 Kconfig selects DLM_SCTP, which does not exist.
> >>
> >> https://lkml.org/lkml/2012/3/13/456
> >>
> > 
> > Does the following fix the problem? If so then I'll roll that into the
> > tree before it gets pushed,
> > 
> 
> No, that's not sufficient:
> 
> warning: (GFS2_FS) selects DLM which has unmet direct dependencies (EXPERIMENTAL && INET && SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n))
> warning: (DLM && GFS2_FS) selects IP_SCTP which has unmet direct dependencies (NET && INET && EXPERIMENTAL && (IPV6 || IPV6=n))
> 
> and
> 
> ERROR: "crc32c" [net/sctp/sctp.ko] undefined!
> 
> 
Hmm, ok. I'll look at this again. I'm not sure why DLM is still calling
itself EXPERIMENTAL since thats long since not been the case, maybe SCTP
still is, but I don't think GFS2 should be selecting EXPERIMENTAL
directly, anyway. It is rather easy to tie ones' self in knots with this
config language.... since GFS2_FS_LOCKING_DLM depends on NET && INET &&
(IPV6 || IPV6=n) && HOTPLUG then all those other deps must presumably be
set anyway, so I don't understand quite why DLM doesn't have those
available to it.

I'll dig around a bit and see if I can figure out whats going on here,

Steve.


> 
> 
> > 
> > diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> > index c465ae0..f4e1c60 100644
> > --- a/fs/gfs2/Kconfig
> > +++ b/fs/gfs2/Kconfig
> > @@ -4,7 +4,7 @@ config GFS2_FS
> >  	select DLM if GFS2_FS_LOCKING_DLM
> >  	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
> >  	select SYSFS if GFS2_FS_LOCKING_DLM
> > -	select IP_SCTP if DLM_SCTP
> > +	select IP_SCTP if GFS2_FS_LOCKING_DLM
> >  	select FS_POSIX_ACL
> >  	select CRC32
> >  	select QUOTACTL
> > 
> > 
> 
> 
> 



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 14:59   ` Steven Whitehouse
@ 2012-03-19 15:18     ` Randy Dunlap
  2012-03-19 15:34       ` Steven Whitehouse
  2012-03-20  9:47       ` Steven Whitehouse
  0 siblings, 2 replies; 63+ messages in thread
From: Randy Dunlap @ 2012-03-19 15:18 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-kernel, cluster-devel, teigland

On 03/19/2012 07:59 AM, Steven Whitehouse wrote:

> Hi,
> 
> On Mon, 2012-03-19 at 07:45 -0700, Randy Dunlap wrote:
>> On 03/19/2012 03:25 AM, Steven Whitehouse wrote:
>>
>>> Hi,
>>>
>>> Not a huge number of patches this time. Some notable new features
>>> though:
>>>  - Glock stats gathering (v. useful for performance analysis)
>>>  - FITRIM ioctl support
>>>  - Sorting the ordered write list (big performance increase when the workload
>>>    doesn't result in the write requests being nicely ordered to start with)
>>>
>>> Plus a few clean ups, and bug fixes in addition,
>>
>>
>>
>> Hi,
>>
>> I reported a build error in linux-next 20120313, but it appears
>> that mainline also needs the fix (when it's ready) since mainline
>> gfs2 Kconfig selects DLM_SCTP, which does not exist.
>>
>> https://lkml.org/lkml/2012/3/13/456
>>
> 
> Does the following fix the problem? If so then I'll roll that into the
> tree before it gets pushed,
> 

No, that's not sufficient:

warning: (GFS2_FS) selects DLM which has unmet direct dependencies (EXPERIMENTAL && INET && SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n))
warning: (DLM && GFS2_FS) selects IP_SCTP which has unmet direct dependencies (NET && INET && EXPERIMENTAL && (IPV6 || IPV6=n))

and

ERROR: "crc32c" [net/sctp/sctp.ko] undefined!




> 
> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> index c465ae0..f4e1c60 100644
> --- a/fs/gfs2/Kconfig
> +++ b/fs/gfs2/Kconfig
> @@ -4,7 +4,7 @@ config GFS2_FS
>  	select DLM if GFS2_FS_LOCKING_DLM
>  	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
>  	select SYSFS if GFS2_FS_LOCKING_DLM
> -	select IP_SCTP if DLM_SCTP
> +	select IP_SCTP if GFS2_FS_LOCKING_DLM
>  	select FS_POSIX_ACL
>  	select CRC32
>  	select QUOTACTL
> 
> 



-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 14:45 ` Randy Dunlap
@ 2012-03-19 14:59   ` Steven Whitehouse
  2012-03-19 15:18     ` Randy Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-19 14:59 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, cluster-devel, teigland

Hi,

On Mon, 2012-03-19 at 07:45 -0700, Randy Dunlap wrote:
> On 03/19/2012 03:25 AM, Steven Whitehouse wrote:
> 
> > Hi,
> > 
> > Not a huge number of patches this time. Some notable new features
> > though:
> >  - Glock stats gathering (v. useful for performance analysis)
> >  - FITRIM ioctl support
> >  - Sorting the ordered write list (big performance increase when the workload
> >    doesn't result in the write requests being nicely ordered to start with)
> > 
> > Plus a few clean ups, and bug fixes in addition,
> 
> 
> 
> Hi,
> 
> I reported a build error in linux-next 20120313, but it appears
> that mainline also needs the fix (when it's ready) since mainline
> gfs2 Kconfig selects DLM_SCTP, which does not exist.
> 
> https://lkml.org/lkml/2012/3/13/456
> 

Does the following fix the problem? If so then I'll roll that into the
tree before it gets pushed,

Steve.

diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index c465ae0..f4e1c60 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -4,7 +4,7 @@ config GFS2_FS
 	select DLM if GFS2_FS_LOCKING_DLM
 	select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
 	select SYSFS if GFS2_FS_LOCKING_DLM
-	select IP_SCTP if DLM_SCTP
+	select IP_SCTP if GFS2_FS_LOCKING_DLM
 	select FS_POSIX_ACL
 	select CRC32
 	select QUOTACTL



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: GFS2: Pre-pull patch posting (merge window)
  2012-03-19 10:25 Steven Whitehouse
@ 2012-03-19 14:45 ` Randy Dunlap
  2012-03-19 14:59   ` Steven Whitehouse
  0 siblings, 1 reply; 63+ messages in thread
From: Randy Dunlap @ 2012-03-19 14:45 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-kernel, cluster-devel

On 03/19/2012 03:25 AM, Steven Whitehouse wrote:

> Hi,
> 
> Not a huge number of patches this time. Some notable new features
> though:
>  - Glock stats gathering (v. useful for performance analysis)
>  - FITRIM ioctl support
>  - Sorting the ordered write list (big performance increase when the workload
>    doesn't result in the write requests being nicely ordered to start with)
> 
> Plus a few clean ups, and bug fixes in addition,



Hi,

I reported a build error in linux-next 20120313, but it appears
that mainline also needs the fix (when it's ready) since mainline
gfs2 Kconfig selects DLM_SCTP, which does not exist.

https://lkml.org/lkml/2012/3/13/456

-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-03-19 10:25 Steven Whitehouse
  2012-03-19 14:45 ` Randy Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: Steven Whitehouse @ 2012-03-19 10:25 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Not a huge number of patches this time. Some notable new features
though:
 - Glock stats gathering (v. useful for performance analysis)
 - FITRIM ioctl support
 - Sorting the ordered write list (big performance increase when the workload
   doesn't result in the write requests being nicely ordered to start with)

Plus a few clean ups, and bug fixes in addition,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2012-01-05 11:51 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2012-01-05 11:51 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

The main feature this time is clean up around the allocation and
resource group code. Otherwise the remainder is mostly small
bug fixes.

I've held back the glock stats patch and that will probably be
ready for the following merge window with a bit of luck,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2011-10-24 12:48 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2011-10-24 12:48 UTC (permalink / raw)
  To: cluster-devel, linux-kernel

Hi,

Since the merge window is upon us, here is the current content of
the GFS2 git tree. A few things will be help back to the following
merge window in order to ensure a greater test time, but those currently
in the tree are ready for the current window.

Recently I've reconstituted the GFS2 git tree, so it can be pulled
(via http) from:

http://sucs.org/~rohan/git/gfs2-3.0-nmw

and viewed via gitweb at:

http://sucs.org/gitweb/

This is thanks to the Swansea University Computer Society for providing
a temporary (or possibly permanent) home for the GFS2 git trees. Please
treat their server kindly as this will only continue while it doesn't
generate too much traffic. I figure that there will not be too many
people pulling the GFS2 tree at once, but we'll see.

Some highlights of the current patch set:
 o Reduction in code of approx 400 lines
 o Big clean up (and speed up) in the resource group code
   - This is a nice base to build some forthcoming improvements on
   - It should improve performance with multi-threaded workloads
 o Some left-over fsync/writeback changes
 o Improvements to readahead when deallocating large directories

Any questions/concerns then please let me know as usual,

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2011-07-22  9:16 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2011-07-22  9:16 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

Not a lot new this time... the addition of a cache for the directory hash table
improve directory read/lookup speed, automatic adjustment of the glock hold
time improves performance for some contention corner cases. S_NOSEC support
is another performance related change, plus a nice clean up from Eric
Sandeen,

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2011-05-19  8:46 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2011-05-19  8:46 UTC (permalink / raw)
  To: linux-kernel, cluster-devel

Hi,

This time, most of the GFS2 patches are code clean up, although there
are a few bug fixes (fallocate/ail writeback/end of life inodes/nlink) and
some new features (new tracepoint & tracing flags, using the UUID field
in the generic superblock).

The changes can be broadly divided into three sets:

1. Bob's directory code clean up
2. My fsync/ail writeback fixes & clean up
3. inode.c/ops_inode.c clean up

Steve.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* GFS2: Pre-pull patch posting (merge window)
@ 2011-03-15  9:11 Steven Whitehouse
  0 siblings, 0 replies; 63+ messages in thread
From: Steven Whitehouse @ 2011-03-15  9:11 UTC (permalink / raw)
  To: cluster-devel, linux-kernel

Hi,

The most interesting "feature" in this patch set is the RCU glock
patch which has been a long time coming, but is finally here. That
patch contains most of the changes this time. The other patches ins
this set are mostly smaller bug fixes and performance improvements.

Steve.



^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2014-10-08  9:53 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-20 12:23 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
2014-01-20 12:23 ` [PATCH 01/24] GFS2: If requested is too large, use the largest extent in the rgrp Steven Whitehouse
2014-01-20 12:23 ` [PATCH 02/24] GFS2: Drop inadequate rgrps from the reservation tree Steven Whitehouse
2014-01-20 12:23 ` [PATCH 03/24] GFS2: Implement a "rgrp has no extents longer than X" scheme Steven Whitehouse
2014-01-20 12:23 ` [PATCH 04/24] GFS2: Clean up releasepage Steven Whitehouse
2014-01-20 12:23 ` [PATCH 05/24] GFS2: Remove gfs2_quota_change_host structure Steven Whitehouse
2014-01-20 12:23 ` [PATCH 06/24] GFS2: Remove test which is always true Steven Whitehouse
2014-01-20 12:23 ` [PATCH 07/24] GFS2: Use range based functions for rgrp sync/invalidation Steven Whitehouse
2014-01-20 12:23 ` [PATCH 08/24] GFS2: Use only a single address space for rgrps Steven Whitehouse
2014-01-20 12:23 ` [PATCH 09/24] GFS2: Add directory addition info structure Steven Whitehouse
2014-01-20 12:23 ` [PATCH 10/24] GFS2: Consolidate transaction blocks calculation for dir add Steven Whitehouse
2014-01-20 12:23 ` [PATCH 11/24] GFS2: Remember directory insert point Steven Whitehouse
2014-01-20 12:23 ` [PATCH 12/24] GFS2: Increase i_writecount during gfs2_setattr_chown Steven Whitehouse
2014-01-20 12:23 ` [PATCH 13/24] GFS2: For exhash conversion, only one block is needed Steven Whitehouse
2014-01-20 12:23 ` [PATCH 14/24] GFS2: Add hints to directory leaf blocks Steven Whitehouse
2014-01-20 12:23 ` [PATCH 15/24] GFS2: Add initialization for address space in super block Steven Whitehouse
2014-01-20 12:23 ` [PATCH 16/24] GFS2: No need to invalidate pages for a dio read Steven Whitehouse
2014-01-20 12:23 ` [PATCH 17/24] GFS2: Use RCU/hlist_bl based hash for quotas Steven Whitehouse
2014-01-22  5:32   ` Paul E. McKenney
2014-01-22  6:06     ` Sasha Levin
2014-01-22  9:43       ` Steven Whitehouse
2014-01-22  9:58     ` Steven Whitehouse
2014-01-20 12:23 ` [PATCH 18/24] GFS2: Only run logd and quota when mounted read/write Steven Whitehouse
2014-01-20 12:23 ` [PATCH 19/24] GFS2: Clean up quota slot allocation Steven Whitehouse
2014-01-20 12:23 ` [PATCH 20/24] GFS2: Move quota bitmap operations under their own lock Steven Whitehouse
2014-01-20 12:23 ` [PATCH 21/24] GFS2: Fix kbuild test robot reported warning Steven Whitehouse
2014-01-20 12:23 ` [PATCH 22/24] GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Steven Whitehouse
2014-01-20 12:23 ` [PATCH 23/24] GFS2: Small cleanup Steven Whitehouse
2014-01-20 12:23 ` [PATCH 24/24] GFS2: revert "GFS2: d_splice_alias() can't return error" Steven Whitehouse
  -- strict thread matches above, loose matches on Subject: below --
2014-10-08  9:53 GFS2: Pre-pull patch posting (merge window) Steven Whitehouse
2014-06-03 11:02 Steven Whitehouse
2014-04-01  9:15 Steven Whitehouse
2013-11-04 11:09 Steven Whitehouse
2013-09-05  9:02 GFS2 " Steven Whitehouse
2013-07-01  9:33 GFS2: " Steven Whitehouse
2013-04-26  9:18 Steven Whitehouse
2013-02-19 10:07 Steven Whitehouse
2012-11-30  9:52 Steven Whitehouse
2012-09-26  8:25 Steven Whitehouse
2012-07-23  8:00 Steven Whitehouse
2012-05-17 12:23 Steven Whitehouse
2012-03-19 10:25 Steven Whitehouse
2012-03-19 14:45 ` Randy Dunlap
2012-03-19 14:59   ` Steven Whitehouse
2012-03-19 15:18     ` Randy Dunlap
2012-03-19 15:34       ` Steven Whitehouse
2012-03-23 19:41         ` David Teigland
2012-03-23 19:46           ` David Miller
2012-03-23 20:06           ` Randy Dunlap
2012-03-23 20:09             ` Steven Whitehouse
2012-03-23 20:18             ` David Teigland
2012-03-23 22:06               ` Benjamin Poirier
2012-03-23 22:48                 ` Randy Dunlap
2012-03-26 10:44                 ` Steven Whitehouse
2012-04-02 15:35                   ` Randy Dunlap
2012-04-02 15:47                     ` Steven Whitehouse
2012-03-23 22:06               ` Randy Dunlap
2012-03-20  9:47       ` Steven Whitehouse
2012-01-05 11:51 Steven Whitehouse
2011-10-24 12:48 Steven Whitehouse
2011-07-22  9:16 Steven Whitehouse
2011-05-19  8:46 Steven Whitehouse
2011-03-15  9:11 Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).