All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes
@ 2020-01-03 15:31 Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc Bob Peterson
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch set, several gfs2 functions failed to reserve enough
revoke entries in the journal. Some examples:

1. gfs2_dinode_dealloc failed to reserve a revoke for the dinode
   being deleted.
2. Any function that allocates dinodes with gfs2_alloc_blocks
   should reserve a revoke because alloc_blocks will premptively
   call trans_remove_revoke to make sure there isn't a pending revoke
   for the new dinode.
3. Any function that potentially will unstuff a stuffed directory
   needs to reserve a revoke because gfs2_unstuff_dinode calls
   gfs2_trans_remove_revoke for the new journaled leaf block.

In addition, function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This series adds needed revoke entries to the transactions that
need them. So now we try to do proper accounting of revokes.

Bob Peterson (6):
  gfs2: revoke cleanup: leaf_dealloc
  gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode
  gfs2: revoke cleanup: gfs2_dinode_dealloc
  gfs2: revoke cleanup: gfs2_iomap_begin_write
  gfs2: revoke cleanup: truncate functions
  gfs2: revoke cleanup: gfs2_trans_remove_revoke

 fs/gfs2/bmap.c  | 25 +++++++++++++------------
 fs/gfs2/dir.c   |  3 ++-
 fs/gfs2/inode.c |  5 +++--
 fs/gfs2/super.c |  2 +-
 fs/gfs2/trans.c | 16 ++++++++++++++--
 5 files changed, 33 insertions(+), 18 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-10 12:23   ` Andreas Gruenbacher
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 2/6] gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode Bob Peterson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds the needed revoke entries to function leaf_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index eb9c0578978f..dfc3a3fa894d 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -2031,7 +2031,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
 
 	error = gfs2_trans_begin(sdp,
 			rg_blocks + (DIV_ROUND_UP(size, sdp->sd_jbsize) + 1) +
-			RES_DINODE + RES_STATFS + RES_QUOTA, l_blocks);
+			RES_DINODE + RES_STATFS + RES_QUOTA, RES_DINODE +
+				 l_blocks);
 	if (error)
 		goto out_rg_gunlock;
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/6] gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 3/6] gfs2: revoke cleanup: gfs2_dinode_dealloc Bob Peterson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds the needed revoke entries to functions alloc_dinode and
gfs2_create_inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index dafef10b91f1..eb0989ff1c6f 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -383,7 +383,8 @@ static int alloc_dinode(struct gfs2_inode *ip, u32 flags, unsigned *dblocks)
 	if (error)
 		goto out_quota;
 
-	error = gfs2_trans_begin(sdp, (*dblocks * RES_RG_BIT) + RES_STATFS + RES_QUOTA, 0);
+	error = gfs2_trans_begin(sdp, (*dblocks * RES_RG_BIT) + RES_STATFS +
+				 RES_QUOTA, *dblocks);
 	if (error)
 		goto out_ipreserv;
 
@@ -709,7 +710,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_free_inode;
 
-	error = gfs2_trans_begin(sdp, blocks, 0);
+	error = gfs2_trans_begin(sdp, blocks, RES_DINODE);
 	if (error)
 		goto fail_free_inode;
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/6] gfs2: revoke cleanup: gfs2_dinode_dealloc
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 2/6] gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 4/6] gfs2: revoke cleanup: gfs2_iomap_begin_write Bob Peterson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds the needed revoke entries to function gfs2_dinode_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 68cc7c291a81..c762456a9500 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1188,7 +1188,7 @@ static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
 		goto out_qs;
 
 	error = gfs2_trans_begin(sdp, RES_RG_BIT + RES_STATFS + RES_QUOTA,
-				 sdp->sd_jdesc->jd_blocks);
+				 RES_DINODE + sdp->sd_jdesc->jd_blocks);
 	if (error)
 		goto out_rg_gunlock;
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 4/6] gfs2: revoke cleanup: gfs2_iomap_begin_write
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
                   ` (2 preceding siblings ...)
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 3/6] gfs2: revoke cleanup: gfs2_dinode_dealloc Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 5/6] gfs2: revoke cleanup: truncate functions Bob Peterson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds the needed revoke entries to function gfs2_iomap_begin_write.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/bmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 08f6fbb3655e..403d5ada2f52 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1074,7 +1074,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 	if (unstuff || iomap->type == IOMAP_HOLE) {
 		unsigned int data_blocks, ind_blocks;
 		struct gfs2_alloc_parms ap = {};
-		unsigned int rblocks;
+		unsigned int rblocks, new_blocks = 0;
 		struct gfs2_trans *tr;
 
 		gfs2_write_calc_reserv(ip, iomap->length, &data_blocks,
@@ -1095,10 +1095,10 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 			rblocks += RES_STATFS + RES_QUOTA;
 		if (inode == sdp->sd_rindex)
 			rblocks += 2 * RES_STATFS;
-		rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+		new_blocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+		rblocks += new_blocks;
 
-		ret = gfs2_trans_begin(sdp, rblocks,
-				       iomap->length >> inode->i_blkbits);
+		ret = gfs2_trans_begin(sdp, rblocks, new_blocks);
 		if (ret)
 			goto out_trans_fail;
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 5/6] gfs2: revoke cleanup: truncate functions
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
                   ` (3 preceding siblings ...)
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 4/6] gfs2: revoke cleanup: gfs2_iomap_begin_write Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 6/6] gfs2: revoke cleanup: gfs2_trans_remove_revoke Bob Peterson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds the needed revoke entries to truncate functions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/bmap.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 403d5ada2f52..9a4c9c8611f7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1355,8 +1355,6 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
 	return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops);
 }
 
-#define GFS2_JTRUNC_REVOKES 8192
-
 /**
  * gfs2_journaled_truncate - Wrapper for truncate_pagecache for jdata files
  * @inode: The inode being truncated
@@ -1371,7 +1369,7 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
 static int gfs2_journaled_truncate(struct inode *inode, u64 oldsize, u64 newsize)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	u64 max_chunk = GFS2_JTRUNC_REVOKES * sdp->sd_vfs->s_blocksize;
+	u64 max_chunk = sdp->sd_ldptrs * sdp->sd_vfs->s_blocksize;
 	u64 chunk;
 	int error;
 
@@ -1395,7 +1393,8 @@ static int gfs2_journaled_truncate(struct inode *inode, u64 oldsize, u64 newsize
 			continue;
 
 		gfs2_trans_end(sdp);
-		error = gfs2_trans_begin(sdp, RES_DINODE, GFS2_JTRUNC_REVOKES);
+		error = gfs2_trans_begin(sdp, RES_DINODE,
+					 RES_DINODE + sdp->sd_ldptrs);
 		if (error)
 			return error;
 	}
@@ -1413,7 +1412,8 @@ static int trunc_start(struct inode *inode, u64 newsize)
 	int error;
 
 	if (journaled)
-		error = gfs2_trans_begin(sdp, RES_DINODE + RES_JDATA, GFS2_JTRUNC_REVOKES);
+		error = gfs2_trans_begin(sdp, RES_DINODE + RES_JDATA,
+					 RES_DINODE + sdp->sd_ldptrs);
 	else
 		error = gfs2_trans_begin(sdp, RES_DINODE, 0);
 	if (error)
@@ -2404,7 +2404,7 @@ static int gfs2_journaled_truncate_range(struct inode *inode, loff_t offset,
 					 loff_t length)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
-	loff_t max_chunk = GFS2_JTRUNC_REVOKES * sdp->sd_vfs->s_blocksize;
+	loff_t max_chunk = sdp->sd_ldptrs * sdp->sd_vfs->s_blocksize;
 	int error;
 
 	while (length) {
@@ -2429,7 +2429,8 @@ static int gfs2_journaled_truncate_range(struct inode *inode, loff_t offset,
 			continue;
 
 		gfs2_trans_end(sdp);
-		error = gfs2_trans_begin(sdp, RES_DINODE, GFS2_JTRUNC_REVOKES);
+		error = gfs2_trans_begin(sdp, RES_DINODE,
+					 RES_DINODE + sdp->sd_ldptrs);
 		if (error)
 			return error;
 	}
@@ -2453,7 +2454,7 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	if (gfs2_is_jdata(ip))
 		error = gfs2_trans_begin(sdp, RES_DINODE + 2 * RES_JDATA,
-					 GFS2_JTRUNC_REVOKES);
+					 RES_DINODE + sdp->sd_ldptrs);
 	else
 		error = gfs2_trans_begin(sdp, RES_DINODE, 0);
 	if (error)
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 6/6] gfs2: revoke cleanup: gfs2_trans_remove_revoke
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
                   ` (4 preceding siblings ...)
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 5/6] gfs2: revoke cleanup: truncate functions Bob Peterson
@ 2020-01-03 15:31 ` Bob Peterson
  2020-01-06 13:46 ` [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
  2020-01-07 10:19 ` Andreas Gruenbacher
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-03 15:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Several gfs2 functions failed to reserve enough revoke entries for their
transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
re-added to the system-wide revoke numbers, thereby decrementing the value
(sd_log_commited_revoke) "properly," but by accident. This worked properly
most of the time because one transaction would reserve space for revokes,
then it would be merged with the system transaction (sdp->sd_log_tr) and it
usually did not run out, because you can hold a lot of revoke entries
per log descriptor block. Some of the code, such as gfs2_write_revokes, would
work around this and somehow got it right most of the time. However, some
jdata tests with xfstests generic/269 encountered problems when it actually
ran out.

This patch is part of a series that tries to do proper accounting of revokes.

This patch adds sanity checking for revoke counts to function
gfs2_trans_remove_revoke.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/trans.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index 4d01fe19c125..07c2d2194a9b 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -261,10 +261,22 @@ void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			list_del_init(&bd->bd_list);
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
-			if (bd->bd_gl)
+			if (bd->bd_gl) {
+				sdp->sd_log_commited_revoke--;
 				gfs2_glock_remove_revoke(bd->bd_gl);
+			}
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
-			tr->tr_num_revoke--;
+			if (tr->tr_num_revoke)
+				tr->tr_num_revoke--;
+			else if (sdp->sd_log_tr &&
+				 sdp->sd_log_tr->tr_num_revoke)
+				sdp->sd_log_tr->tr_num_revoke--;
+			else {
+				fs_warn(sdp, "Removing a revoke that was not "
+					"reserved! Please investigate.\n");
+				gfs2_print_trans(sdp, tr);
+				BUG();
+			}
 			if (--n == 0)
 				break;
 		}
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
                   ` (5 preceding siblings ...)
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 6/6] gfs2: revoke cleanup: gfs2_trans_remove_revoke Bob Peterson
@ 2020-01-06 13:46 ` Bob Peterson
  2020-01-07 10:19 ` Andreas Gruenbacher
  7 siblings, 0 replies; 10+ messages in thread
From: Bob Peterson @ 2020-01-06 13:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Before this patch set, several gfs2 functions failed to reserve enough
> revoke entries in the journal. Some examples:
> 
> 1. gfs2_dinode_dealloc failed to reserve a revoke for the dinode
>    being deleted.
> 2. Any function that allocates dinodes with gfs2_alloc_blocks
>    should reserve a revoke because alloc_blocks will premptively
>    call trans_remove_revoke to make sure there isn't a pending revoke
>    for the new dinode.
> 3. Any function that potentially will unstuff a stuffed directory
>    needs to reserve a revoke because gfs2_unstuff_dinode calls
>    gfs2_trans_remove_revoke for the new journaled leaf block.
> 
> In addition, function gfs2_trans_remove_revoke unconditionally
> decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
> value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is
> later
> re-added to the system-wide revoke numbers, thereby decrementing the value
> (sd_log_commited_revoke) "properly," but by accident. This worked properly
> most of the time because one transaction would reserve space for revokes,
> then it would be merged with the system transaction (sdp->sd_log_tr) and it
> usually did not run out, because you can hold a lot of revoke entries
> per log descriptor block. Some of the code, such as gfs2_write_revokes, would
> work around this and somehow got it right most of the time. However, some
> jdata tests with xfstests generic/269 encountered problems when it actually
> ran out.
> 
> This series adds needed revoke entries to the transactions that
> need them. So now we try to do proper accounting of revokes.
> 
> Bob Peterson (6):
>   gfs2: revoke cleanup: leaf_dealloc
>   gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode
>   gfs2: revoke cleanup: gfs2_dinode_dealloc
>   gfs2: revoke cleanup: gfs2_iomap_begin_write
>   gfs2: revoke cleanup: truncate functions
>   gfs2: revoke cleanup: gfs2_trans_remove_revoke
> 
>  fs/gfs2/bmap.c  | 25 +++++++++++++------------
>  fs/gfs2/dir.c   |  3 ++-
>  fs/gfs2/inode.c |  5 +++--
>  fs/gfs2/super.c |  2 +-
>  fs/gfs2/trans.c | 16 ++++++++++++++--
>  5 files changed, 33 insertions(+), 18 deletions(-)
Hi,

Self-NACK on this patch series. It's not ready for prime time yet.

While it's still true that many transactions don't reserve enough blocks for
revokes, the last patch to gfs2_trans_remove_revoke won't work properly.

It checks for revokes being removed and the counters "improperly" going negative,
but there's a disconnect: Multiple processes can add the same metadata block
over and over to the journal, and they get merged with the superblock transaction.
For example, when multiple processes allocate and/or delete blocks and the
bitmap block gets rewritten over and over again for multiple transactions.

This means after some churn, a process might (properly) add a metadata block
but in so doing, remove multiple revokes for that same block, which causes
the numbers to temporarily go negative. And this is not due to a lack of
revokes being reserved. This is because the revokes go on a central list
from the superblock, but may originate from multiple transactions.
For example, a transaction may revise a bitmap, but in so doing, it may
need to remove several revokes that were added by other processes before it,
and thus, its count of transaction revokes goes negative.

By the way, the value can go negative because patch e955537e326 changed the
behavior from always using positive numbers. (Which is not necessarily a bad
thing).

One solution to this would be to search the global revokes list for all
revoke instances of when new revokes are added, and remove them so that
there's only a single instance on the queue at any given point in time.

If we did this, I'd be worried about improper behavior when journal replays
are needed when transactions are interrupted by a node getting blasted or
fenced, etc. I'd also be worried about a performance impact if every revoke
needs to search a global linked list, while holding the list's ail list
spin_lock.

As I see it, we have several choices:

1. We can revert patch e955537e326 and only allow positive numbers.
   There isn't much gain in doing this.
2. We can embrace the negative values and just fix the problem revoke
   transactions. This might be as simple as eliminating the last patch,
   "gfs2: revoke cleanup: gfs2_trans_remove_revoke".
   If it makes sense, we can also change the counter from unsigned int
   to a signed int. Or maybe we don't even care.
3. We can work out a new solution to keep track of revokes and revoke counts.

For several years we've talked about ways to improve our journal flushing
code and to possibly allow transactions to be queued during log flushes.
So whatever solution we come up with needs to keep that in mind.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes
  2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
                   ` (6 preceding siblings ...)
  2020-01-06 13:46 ` [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
@ 2020-01-07 10:19 ` Andreas Gruenbacher
  7 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2020-01-07 10:19 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

On Fri, Jan 3, 2020 at 4:33 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch set, several gfs2 functions failed to reserve enough
> revoke entries in the journal. Some examples:
>
> 1. gfs2_dinode_dealloc failed to reserve a revoke for the dinode
>    being deleted.

that sounds like a bug.

> 2. Any function that allocates dinodes with gfs2_alloc_blocks
>    should reserve a revoke because alloc_blocks will premptively
>    call trans_remove_revoke to make sure there isn't a pending revoke
>    for the new dinode.
> 3. Any function that potentially will unstuff a stuffed directory
>    needs to reserve a revoke because gfs2_unstuff_dinode calls
>    gfs2_trans_remove_revoke for the new journaled leaf block.

Removing revokes doesn't require any space in the journal, so why
should we need to reserve space in that case?

> In addition, function gfs2_trans_remove_revoke unconditionally
> decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
> value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
> re-added to the system-wide revoke numbers, thereby decrementing the value
> (sd_log_commited_revoke) "properly," but by accident. This worked properly
> most of the time because one transaction would reserve space for revokes,
> then it would be merged with the system transaction (sdp->sd_log_tr) and it
> usually did not run out, because you can hold a lot of revoke entries
> per log descriptor block. Some of the code, such as gfs2_write_revokes, would
> work around this and somehow got it right most of the time. However, some
> jdata tests with xfstests generic/269 encountered problems when it actually
> ran out.

Would something like this work to fix the revoke accounting?

--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -256 +255,0 @@ void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp,
u64 blkno, unsigned int len)
-       struct gfs2_trans *tr = current->journal_info;
@@ -268 +267 @@ void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp,
u64 blkno, unsigned int len)
-                       tr->tr_num_revoke--;
+                       bd->bd_tr->tr_num_revoke--;

Thanks,
Andreas




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc
  2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc Bob Peterson
@ 2020-01-10 12:23   ` Andreas Gruenbacher
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2020-01-10 12:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

On Fri, Jan 3, 2020 at 4:33 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Several gfs2 functions failed to reserve enough revoke entries for their
> transactions in the journal. Function gfs2_trans_remove_revoke unconditionally
> decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
> value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is later
> re-added to the system-wide revoke numbers, thereby decrementing the value
> (sd_log_commited_revoke) "properly," but by accident. This worked properly
> most of the time because one transaction would reserve space for revokes,
> then it would be merged with the system transaction (sdp->sd_log_tr) and it
> usually did not run out, because you can hold a lot of revoke entries
> per log descriptor block. Some of the code, such as gfs2_write_revokes, would
> work around this and somehow got it right most of the time. However, some
> jdata tests with xfstests generic/269 encountered problems when it actually
> ran out.
>
> This patch is part of a series that tries to do proper accounting of revokes.
>
> This patch adds the needed revoke entries to function leaf_dealloc.

it seems this patch should be safe to add to for-next (with a fixed
patch description). Can you do so?

> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/dir.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
> index eb9c0578978f..dfc3a3fa894d 100644
> --- a/fs/gfs2/dir.c
> +++ b/fs/gfs2/dir.c
> @@ -2031,7 +2031,8 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len,
>
>         error = gfs2_trans_begin(sdp,
>                         rg_blocks + (DIV_ROUND_UP(size, sdp->sd_jbsize) + 1) +
> -                       RES_DINODE + RES_STATFS + RES_QUOTA, l_blocks);
> +                       RES_DINODE + RES_STATFS + RES_QUOTA, RES_DINODE +
> +                                l_blocks);
>         if (error)
>                 goto out_rg_gunlock;
>
> --
> 2.24.1

Thanks,
Andreas




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-10 12:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-03 15:31 [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 1/6] gfs2: revoke cleanup: leaf_dealloc Bob Peterson
2020-01-10 12:23   ` Andreas Gruenbacher
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 2/6] gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode Bob Peterson
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 3/6] gfs2: revoke cleanup: gfs2_dinode_dealloc Bob Peterson
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 4/6] gfs2: revoke cleanup: gfs2_iomap_begin_write Bob Peterson
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 5/6] gfs2: revoke cleanup: truncate functions Bob Peterson
2020-01-03 15:31 ` [Cluster-devel] [GFS2 PATCH 6/6] gfs2: revoke cleanup: gfs2_trans_remove_revoke Bob Peterson
2020-01-06 13:46 ` [Cluster-devel] [GFS2 PATCH 0/6] gfs2: jdata transactions not reserving enough revokes Bob Peterson
2020-01-07 10:19 ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.