linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] xfs: support shrinking free space in the last AG
@ 2021-01-26 12:56 Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

Hi folks,

v5: https://lore.kernel.org/r/20210118083700.2384277-1-hsiangkao@redhat.com

This patchset attempts to support shrinking free space in the last AG.
This version addresses Darrick's review of v5, aside from that I'm not
sure if seperating the whole shrink functionality is a good idea (I just
tried but it seemed that ~90% is duplicated code.) IMHO, It'd be better
to separate when needed (I'm investigating shrinking the whole AGs as
well, it seems not too much invasive than the current approach...) Yet
if people have strong opinion of this, I will resend the next version
instead.

Thanks for the time!

Thanks,
Gao Xiang

xfsprogs: https://lore.kernel.org/r/20201028114010.545331-1-hsiangkao@redhat.com
xfstests: https://lore.kernel.org/r/20201028230909.639698-1-hsiangkao@redhat.com

Changes since v5 (Darrick):
 - [3/7] use a separate patch to update lazy sb counters;
 - [5/7] introduce the xfs_ag_shrink_space() helper first
         as a seperate patch... I think it'd be better to "define
         xfs_ag_shrink_space() as a stub that returns EOPNOSUPP..."
 - [5/7] roll the transaction in advance so the new trans can be
         canceled safely.
 - [6/7] "nagcount != oagcount" ==> "nagcount < oagcount"

Gao Xiang (7):
  xfs: rename `new' to `delta' in xfs_growfs_data_private()
  xfs: get rid of xfs_growfs_{data,log}_t
  xfs: update lazy sb counters immediately for resizefs
  xfs: hoist out xfs_resizefs_init_new_ags()
  xfs: introduce xfs_ag_shrink_space()
  xfs: support shrinking unused space in the last AG
  xfs: add error injection for per-AG resv failure when shrinkfs

 fs/xfs/libxfs/xfs_ag.c       | 113 +++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ag.h       |   2 +
 fs/xfs/libxfs/xfs_errortag.h |   4 +-
 fs/xfs/xfs_error.c           |   2 +
 fs/xfs/xfs_fsops.c           | 158 ++++++++++++++++++++++-------------
 fs/xfs/xfs_fsops.h           |   4 +-
 fs/xfs/xfs_ioctl.c           |   4 +-
 fs/xfs/xfs_trans.c           |   1 -
 8 files changed, 224 insertions(+), 64 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private()
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-02 19:37   ` Brian Foster
  2021-01-26 12:56 ` [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t Gao Xiang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

It actually means the delta block count of growfs. Rename it in order
to make it clear. Also introduce nb_div to avoid reusing `delta`.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/xfs_fsops.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 959ce91a3755..62600d78bbf1 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -32,8 +32,8 @@ xfs_growfs_data_private(
 	int			error;
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
-	xfs_rfsblock_t		nb, nb_mod;
-	xfs_rfsblock_t		new;
+	xfs_rfsblock_t		nb, nb_div, nb_mod;
+	xfs_rfsblock_t		delta;
 	xfs_agnumber_t		oagcount;
 	xfs_trans_t		*tp;
 	struct aghdr_init_data	id = {};
@@ -50,16 +50,16 @@ xfs_growfs_data_private(
 		return error;
 	xfs_buf_relse(bp);
 
-	new = nb;	/* use new as a temporary here */
-	nb_mod = do_div(new, mp->m_sb.sb_agblocks);
-	nagcount = new + (nb_mod != 0);
+	nb_div = nb;
+	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
+	nagcount = nb_div + (nb_mod != 0);
 	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
 		nagcount--;
 		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
 		if (nb < mp->m_sb.sb_dblocks)
 			return -EINVAL;
 	}
-	new = nb - mp->m_sb.sb_dblocks;
+	delta = nb - mp->m_sb.sb_dblocks;
 	oagcount = mp->m_sb.sb_agcount;
 
 	/* allocate the new per-ag structures */
@@ -89,7 +89,7 @@ xfs_growfs_data_private(
 	INIT_LIST_HEAD(&id.buffer_list);
 	for (id.agno = nagcount - 1;
 	     id.agno >= oagcount;
-	     id.agno--, new -= id.agsize) {
+	     id.agno--, delta -= id.agsize) {
 
 		if (id.agno == nagcount - 1)
 			id.agsize = nb -
@@ -110,8 +110,8 @@ xfs_growfs_data_private(
 	xfs_trans_agblocks_delta(tp, id.nfree);
 
 	/* If there are new blocks in the old last AG, extend it. */
-	if (new) {
-		error = xfs_ag_extend_space(mp, tp, &id, new);
+	if (delta) {
+		error = xfs_ag_extend_space(mp, tp, &id, delta);
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -143,7 +143,7 @@ xfs_growfs_data_private(
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
 	 */
-	if (new) {
+	if (delta) {
 		struct xfs_perag	*pag;
 
 		pag = xfs_perag_get(mp, id.agno);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-02 19:37   ` Brian Foster
  2021-01-26 12:56 ` [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs Gao Xiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang, Eric Sandeen

Such usage isn't encouraged by the kernel coding style. Leave the
definitions alone in case of userspace users.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/xfs_fsops.c | 12 ++++++------
 fs/xfs/xfs_fsops.h |  4 ++--
 fs/xfs/xfs_ioctl.c |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 62600d78bbf1..a2a407039227 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -25,8 +25,8 @@
  */
 static int
 xfs_growfs_data_private(
-	xfs_mount_t		*mp,		/* mount point for filesystem */
-	xfs_growfs_data_t	*in)		/* growfs data input struct */
+	struct xfs_mount	*mp,		/* mount point for filesystem */
+	struct xfs_growfs_data	*in)		/* growfs data input struct */
 {
 	struct xfs_buf		*bp;
 	int			error;
@@ -35,7 +35,7 @@ xfs_growfs_data_private(
 	xfs_rfsblock_t		nb, nb_div, nb_mod;
 	xfs_rfsblock_t		delta;
 	xfs_agnumber_t		oagcount;
-	xfs_trans_t		*tp;
+	struct xfs_trans	*tp;
 	struct aghdr_init_data	id = {};
 
 	nb = in->newblocks;
@@ -170,8 +170,8 @@ xfs_growfs_data_private(
 
 static int
 xfs_growfs_log_private(
-	xfs_mount_t		*mp,	/* mount point for filesystem */
-	xfs_growfs_log_t	*in)	/* growfs log input struct */
+	struct xfs_mount	*mp,	/* mount point for filesystem */
+	struct xfs_growfs_log	*in)	/* growfs log input struct */
 {
 	xfs_extlen_t		nb;
 
@@ -268,7 +268,7 @@ xfs_growfs_data(
 int
 xfs_growfs_log(
 	xfs_mount_t		*mp,
-	xfs_growfs_log_t	*in)
+	struct xfs_growfs_log	*in)
 {
 	int error;
 
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 92869f6ec8d3..2cffe51a31e8 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -6,8 +6,8 @@
 #ifndef __XFS_FSOPS_H__
 #define	__XFS_FSOPS_H__
 
-extern int xfs_growfs_data(xfs_mount_t *mp, xfs_growfs_data_t *in);
-extern int xfs_growfs_log(xfs_mount_t *mp, xfs_growfs_log_t *in);
+extern int xfs_growfs_data(struct xfs_mount *mp, struct xfs_growfs_data *in);
+extern int xfs_growfs_log(struct xfs_mount *mp, struct xfs_growfs_log *in);
 extern void xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
 extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
 				xfs_fsop_resblks_t *outval);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f61ea5..a62520f49ec5 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -2260,7 +2260,7 @@ xfs_file_ioctl(
 	}
 
 	case XFS_IOC_FSGROWFSDATA: {
-		xfs_growfs_data_t in;
+		struct xfs_growfs_data in;
 
 		if (copy_from_user(&in, arg, sizeof(in)))
 			return -EFAULT;
@@ -2274,7 +2274,7 @@ xfs_file_ioctl(
 	}
 
 	case XFS_IOC_FSGROWFSLOG: {
-		xfs_growfs_log_t in;
+		struct xfs_growfs_log in;
 
 		if (copy_from_user(&in, arg, sizeof(in)))
 			return -EFAULT;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-02 19:38   ` Brian Foster
  2021-01-26 12:56 ` [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags() Gao Xiang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

sb_fdblocks will be updated lazily if lazysbcount is enabled,
therefore when shrinking the filesystem sb_fdblocks could be
larger than sb_dblocks and xfs_validate_sb_write() would fail.

Even for growfs case, it'd be better to update lazy sb counters
immediately to reflect the real sb counters.

Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/xfs_fsops.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index a2a407039227..2e490fb75832 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -128,6 +128,14 @@ xfs_growfs_data_private(
 				 nb - mp->m_sb.sb_dblocks);
 	if (id.nfree)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
+
+	/*
+	 * update in-core counters now to reflect the real numbers
+	 * (especially sb_fdblocks)
+	 */
+	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
+		xfs_log_sb(tp);
+
 	xfs_trans_set_sync(tp);
 	error = xfs_trans_commit(tp);
 	if (error)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags()
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
                   ` (2 preceding siblings ...)
  2021-01-26 12:56 ` [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-02 19:38   ` Brian Foster
  2021-01-26 12:56 ` [PATCH v6 5/7] xfs: introduce xfs_ag_shrink_space() Gao Xiang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

Move out related logic for initializing new added AGs to a new helper
in preparation for shrinking. No logic changes.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/xfs_fsops.c | 74 +++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 2e490fb75832..6c4ab5e31054 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -20,6 +20,49 @@
 #include "xfs_ag.h"
 #include "xfs_ag_resv.h"
 
+/*
+ * Write new AG headers to disk. Non-transactional, but need to be
+ * written and completed prior to the growfs transaction being logged.
+ * To do this, we use a delayed write buffer list and wait for
+ * submission and IO completion of the list as a whole. This allows the
+ * IO subsystem to merge all the AG headers in a single AG into a single
+ * IO and hide most of the latency of the IO from us.
+ *
+ * This also means that if we get an error whilst building the buffer
+ * list to write, we can cancel the entire list without having written
+ * anything.
+ */
+static int
+xfs_resizefs_init_new_ags(
+	struct xfs_mount	*mp,
+	struct aghdr_init_data	*id,
+	xfs_agnumber_t		oagcount,
+	xfs_agnumber_t		nagcount,
+	xfs_rfsblock_t		*delta)
+{
+	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
+	int			error;
+
+	INIT_LIST_HEAD(&id->buffer_list);
+	for (id->agno = nagcount - 1;
+	     id->agno >= oagcount;
+	     id->agno--, *delta -= id->agsize) {
+
+		if (id->agno == nagcount - 1)
+			id->agsize = nb - (id->agno *
+					(xfs_rfsblock_t)mp->m_sb.sb_agblocks);
+		else
+			id->agsize = mp->m_sb.sb_agblocks;
+
+		error = xfs_ag_init_headers(mp, id);
+		if (error) {
+			xfs_buf_delwri_cancel(&id->buffer_list);
+			return error;
+		}
+	}
+	return xfs_buf_delwri_submit(&id->buffer_list);
+}
+
 /*
  * growfs operations
  */
@@ -74,36 +117,7 @@ xfs_growfs_data_private(
 	if (error)
 		return error;
 
-	/*
-	 * Write new AG headers to disk. Non-transactional, but need to be
-	 * written and completed prior to the growfs transaction being logged.
-	 * To do this, we use a delayed write buffer list and wait for
-	 * submission and IO completion of the list as a whole. This allows the
-	 * IO subsystem to merge all the AG headers in a single AG into a single
-	 * IO and hide most of the latency of the IO from us.
-	 *
-	 * This also means that if we get an error whilst building the buffer
-	 * list to write, we can cancel the entire list without having written
-	 * anything.
-	 */
-	INIT_LIST_HEAD(&id.buffer_list);
-	for (id.agno = nagcount - 1;
-	     id.agno >= oagcount;
-	     id.agno--, delta -= id.agsize) {
-
-		if (id.agno == nagcount - 1)
-			id.agsize = nb -
-				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
-		else
-			id.agsize = mp->m_sb.sb_agblocks;
-
-		error = xfs_ag_init_headers(mp, &id);
-		if (error) {
-			xfs_buf_delwri_cancel(&id.buffer_list);
-			goto out_trans_cancel;
-		}
-	}
-	error = xfs_buf_delwri_submit(&id.buffer_list);
+	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
 	if (error)
 		goto out_trans_cancel;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 5/7] xfs: introduce xfs_ag_shrink_space()
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
                   ` (3 preceding siblings ...)
  2021-01-26 12:56 ` [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags() Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 6/7] xfs: support shrinking unused space in the last AG Gao Xiang
  2021-01-26 12:56 ` [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs Gao Xiang
  6 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

This patch introduces a helper to shrink unused space in the last AG
by fixing up the freespace btree.

Also make sure that the per-AG reservation works under the new AG
size. If such per-AG reservation or extent allocation fails, roll
the transaction so the new transaction could cancel without any side
effects.

Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c | 108 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ag.h |   2 +
 2 files changed, 110 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 9331f3516afa..c6e68e265269 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -22,6 +22,11 @@
 #include "xfs_ag.h"
 #include "xfs_ag_resv.h"
 #include "xfs_health.h"
+#include "xfs_error.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
 
 static int
 xfs_get_aghdr_buf(
@@ -485,6 +490,109 @@ xfs_ag_init_headers(
 	return error;
 }
 
+int
+xfs_ag_shrink_space(
+	struct xfs_mount	*mp,
+	struct xfs_trans	**tpp,
+	struct aghdr_init_data	*id,
+	xfs_extlen_t		len)
+{
+	struct xfs_alloc_arg	args = {
+		.tp	= *tpp,
+		.mp	= mp,
+		.type	= XFS_ALLOCTYPE_THIS_BNO,
+		.minlen = len,
+		.maxlen = len,
+		.oinfo	= XFS_RMAP_OINFO_SKIP_UPDATE,
+		.resv	= XFS_AG_RESV_NONE,
+		.prod	= 1
+	};
+	struct xfs_buf		*agibp, *agfbp;
+	struct xfs_agi		*agi;
+	struct xfs_agf		*agf;
+	int			error, err2;
+
+	ASSERT(id->agno == mp->m_sb.sb_agcount - 1);
+	error = xfs_ialloc_read_agi(mp, *tpp, id->agno, &agibp);
+	if (error)
+		return error;
+
+	agi = agibp->b_addr;
+
+	error = xfs_alloc_read_agf(mp, *tpp, id->agno, 0, &agfbp);
+	if (error)
+		return error;
+
+	agf = agfbp->b_addr;
+	if (XFS_IS_CORRUPT(mp, agf->agf_length != agi->agi_length))
+		return -EFSCORRUPTED;
+
+	args.fsbno = XFS_AGB_TO_FSB(mp, id->agno,
+				    be32_to_cpu(agi->agi_length) - len);
+
+	/* remove the preallocations before allocation and re-establish then */
+	error = xfs_ag_resv_free(agibp->b_pag);
+	if (error)
+		return error;
+
+	/* internal log shouldn't also show up in the free space btrees */
+	error = xfs_alloc_vextent(&args);
+	if (!error && args.agbno == NULLAGBLOCK)
+		error = -ENOSPC;
+
+	if (error) {
+		/*
+		 * if extent allocation fails, need to roll the transaction to
+		 * ensure that the AGFL fixup has been committed anyway.
+		 */
+		err2 = xfs_trans_roll(tpp);
+		if (err2)
+			return err2;
+		goto resv_init_out;
+	}
+
+	/*
+	 * if successfully deleted from freespace btrees, need to confirm
+	 * per-AG reservation works as expected.
+	 */
+	be32_add_cpu(&agi->agi_length, -len);
+	be32_add_cpu(&agf->agf_length, -len);
+
+	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
+	if (err2) {
+		be32_add_cpu(&agi->agi_length, len);
+		be32_add_cpu(&agf->agf_length, len);
+		if (err2 != -ENOSPC)
+			goto resv_err;
+
+		__xfs_bmap_add_free(*tpp, args.fsbno, len, NULL, true);
+
+		/*
+		 * Roll the transaction before trying to re-init the per-ag
+		 * reservation. The new transaction is clean so it will cancel
+		 * without any side effects.
+		 */
+		error = xfs_defer_finish(tpp);
+		if (error)
+			return error;
+
+		error = -ENOSPC;
+		goto resv_init_out;
+	}
+	xfs_ialloc_log_agi(*tpp, agibp, XFS_AGI_LENGTH);
+	xfs_alloc_log_agf(*tpp, agfbp, XFS_AGF_LENGTH);
+	return 0;
+
+resv_init_out:
+	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
+	if (!err2)
+		return error;
+resv_err:
+	xfs_warn(mp, "Error %d reserving per-AG metadata reserve pool.", err2);
+	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+	return err2;
+}
+
 /*
  * Extent the AG indicated by the @id by the length passed in
  */
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 5166322807e7..ca65c2553889 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -24,6 +24,8 @@ struct aghdr_init_data {
 };
 
 int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
+int xfs_ag_shrink_space(struct xfs_mount *mp, struct xfs_trans **tpp,
+			struct aghdr_init_data *id, xfs_extlen_t len);
 int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
 			struct aghdr_init_data *id, xfs_extlen_t len);
 int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
                   ` (4 preceding siblings ...)
  2021-01-26 12:56 ` [PATCH v6 5/7] xfs: introduce xfs_ag_shrink_space() Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-03 14:23   ` Brian Foster
  2021-01-26 12:56 ` [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs Gao Xiang
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang

As the first step of shrinking, this attempts to enable shrinking
unused space in the last allocation group by fixing up freespace
btree, agi, agf and adjusting super block and use a helper
xfs_ag_shrink_space() to fixup the last AG.

This can be all done in one transaction for now, so I think no
additional protection is needed.

Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
 fs/xfs/xfs_trans.c |  1 -
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6c4ab5e31054..4bcea22f7b3f 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
 	struct aghdr_init_data	*id,
 	xfs_agnumber_t		oagcount,
 	xfs_agnumber_t		nagcount,
-	xfs_rfsblock_t		*delta)
+	int64_t			*delta)
 {
 	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
 	int			error;
@@ -76,33 +76,41 @@ xfs_growfs_data_private(
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_div, nb_mod;
-	xfs_rfsblock_t		delta;
+	int64_t			delta;
 	xfs_agnumber_t		oagcount;
 	struct xfs_trans	*tp;
+	bool			extend;
 	struct aghdr_init_data	id = {};
 
 	nb = in->newblocks;
-	if (nb < mp->m_sb.sb_dblocks)
-		return -EINVAL;
-	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
+	if (nb == mp->m_sb.sb_dblocks)
+		return 0;
+
+	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
+	if (error)
 		return error;
-	error = xfs_buf_read_uncached(mp->m_ddev_targp,
+
+	if (nb > mp->m_sb.sb_dblocks) {
+		error = xfs_buf_read_uncached(mp->m_ddev_targp,
 				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
 				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
-	if (error)
-		return error;
-	xfs_buf_relse(bp);
+		if (error)
+			return error;
+		xfs_buf_relse(bp);
+	}
 
 	nb_div = nb;
 	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
 	nagcount = nb_div + (nb_mod != 0);
 	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
 		nagcount--;
-		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
-		if (nb < mp->m_sb.sb_dblocks)
+		if (nagcount < 2)
 			return -EINVAL;
+		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
 	}
+
 	delta = nb - mp->m_sb.sb_dblocks;
+	extend = (delta > 0);
 	oagcount = mp->m_sb.sb_agcount;
 
 	/* allocate the new per-ag structures */
@@ -110,22 +118,34 @@ xfs_growfs_data_private(
 		error = xfs_initialize_perag(mp, nagcount, &nagimax);
 		if (error)
 			return error;
+	} else if (nagcount < oagcount) {
+		/* TODO: shrinking the entire AGs hasn't yet completed */
+		return -EINVAL;
 	}
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
-			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
+			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
+			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
 
-	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
-	if (error)
-		goto out_trans_cancel;
-
+	if (extend) {
+		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
+						  nagcount, &delta);
+		if (error)
+			goto out_trans_cancel;
+	}
 	xfs_trans_agblocks_delta(tp, id.nfree);
 
-	/* If there are new blocks in the old last AG, extend it. */
+	/* If there are some blocks in the last AG, resize it. */
 	if (delta) {
-		error = xfs_ag_extend_space(mp, tp, &id, delta);
+		if (extend) {
+			error = xfs_ag_extend_space(mp, tp, &id, delta);
+		} else {
+			id.agno = nagcount - 1;
+			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
+		}
+
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -137,15 +157,15 @@ xfs_growfs_data_private(
 	 */
 	if (nagcount > oagcount)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
-	if (nb > mp->m_sb.sb_dblocks)
+	if (nb != mp->m_sb.sb_dblocks)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
 				 nb - mp->m_sb.sb_dblocks);
 	if (id.nfree)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
 
 	/*
-	 * update in-core counters now to reflect the real numbers
-	 * (especially sb_fdblocks)
+	 * update in-core counters now to reflect the real numbers (especially
+	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
 	 */
 	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
 		xfs_log_sb(tp);
@@ -165,7 +185,7 @@ xfs_growfs_data_private(
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
 	 */
-	if (delta) {
+	if (extend && delta) {
 		struct xfs_perag	*pag;
 
 		pag = xfs_perag_get(mp, id.agno);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index e72730f85af1..fd2cbf414b80 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -419,7 +419,6 @@ xfs_trans_mod_sb(
 		tp->t_res_frextents_delta += delta;
 		break;
 	case XFS_TRANS_SB_DBLOCKS:
-		ASSERT(delta > 0);
 		tp->t_dblocks_delta += delta;
 		break;
 	case XFS_TRANS_SB_AGCOUNT:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs
  2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
                   ` (5 preceding siblings ...)
  2021-01-26 12:56 ` [PATCH v6 6/7] xfs: support shrinking unused space in the last AG Gao Xiang
@ 2021-01-26 12:56 ` Gao Xiang
  2021-02-03 14:23   ` Brian Foster
  6 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-01-26 12:56 UTC (permalink / raw)
  To: linux-xfs
  Cc: Darrick J. Wong, Brian Foster, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Gao Xiang, Darrick J . Wong

per-AG resv failure after fixing up freespace is hard to test in an
effective way, so directly add an error injection path to observe
such error handling path works as expected.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c       | 5 +++++
 fs/xfs/libxfs/xfs_errortag.h | 4 +++-
 fs/xfs/xfs_error.c           | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index c6e68e265269..5076913c153f 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -23,6 +23,7 @@
 #include "xfs_ag_resv.h"
 #include "xfs_health.h"
 #include "xfs_error.h"
+#include "xfs_errortag.h"
 #include "xfs_bmap.h"
 #include "xfs_defer.h"
 #include "xfs_log_format.h"
@@ -559,6 +560,10 @@ xfs_ag_shrink_space(
 	be32_add_cpu(&agf->agf_length, -len);
 
 	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
+
+	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL))
+		err2 = -ENOSPC;
+
 	if (err2) {
 		be32_add_cpu(&agi->agi_length, len);
 		be32_add_cpu(&agf->agf_length, len);
diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 6ca9084b6934..5fd71a930b68 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -40,6 +40,7 @@
 #define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
 #define XFS_ERRTAG_BMAP_FINISH_ONE			26
 #define XFS_ERRTAG_AG_RESV_CRITICAL			27
+
 /*
  * DEBUG mode instrumentation to test and/or trigger delayed allocation
  * block killing in the event of failed writes. When enabled, all
@@ -58,7 +59,8 @@
 #define XFS_ERRTAG_BUF_IOERROR				35
 #define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
 #define XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT		37
-#define XFS_ERRTAG_MAX					38
+#define XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL		38
+#define XFS_ERRTAG_MAX					39
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 185b4915b7bf..7bae34bfddd2 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -168,6 +168,7 @@ XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
 XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
 XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
 XFS_ERRORTAG_ATTR_RW(bmap_alloc_minlen_extent,	XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT);
+XFS_ERRORTAG_ATTR_RW(shrinkfs_ag_resv_fail, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -208,6 +209,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
 	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
 	XFS_ERRORTAG_ATTR_LIST(bmap_alloc_minlen_extent),
+	XFS_ERRORTAG_ATTR_LIST(shrinkfs_ag_resv_fail),
 	NULL,
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private()
  2021-01-26 12:56 ` [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
@ 2021-02-02 19:37   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2021-02-02 19:37 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Tue, Jan 26, 2021 at 08:56:15PM +0800, Gao Xiang wrote:
> It actually means the delta block count of growfs. Rename it in order
> to make it clear. Also introduce nb_div to avoid reusing `delta`.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---

Looks reasonable:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fsops.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 959ce91a3755..62600d78bbf1 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -32,8 +32,8 @@ xfs_growfs_data_private(
>  	int			error;
>  	xfs_agnumber_t		nagcount;
>  	xfs_agnumber_t		nagimax = 0;
> -	xfs_rfsblock_t		nb, nb_mod;
> -	xfs_rfsblock_t		new;
> +	xfs_rfsblock_t		nb, nb_div, nb_mod;
> +	xfs_rfsblock_t		delta;
>  	xfs_agnumber_t		oagcount;
>  	xfs_trans_t		*tp;
>  	struct aghdr_init_data	id = {};
> @@ -50,16 +50,16 @@ xfs_growfs_data_private(
>  		return error;
>  	xfs_buf_relse(bp);
>  
> -	new = nb;	/* use new as a temporary here */
> -	nb_mod = do_div(new, mp->m_sb.sb_agblocks);
> -	nagcount = new + (nb_mod != 0);
> +	nb_div = nb;
> +	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> +	nagcount = nb_div + (nb_mod != 0);
>  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
>  		nagcount--;
>  		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
>  		if (nb < mp->m_sb.sb_dblocks)
>  			return -EINVAL;
>  	}
> -	new = nb - mp->m_sb.sb_dblocks;
> +	delta = nb - mp->m_sb.sb_dblocks;
>  	oagcount = mp->m_sb.sb_agcount;
>  
>  	/* allocate the new per-ag structures */
> @@ -89,7 +89,7 @@ xfs_growfs_data_private(
>  	INIT_LIST_HEAD(&id.buffer_list);
>  	for (id.agno = nagcount - 1;
>  	     id.agno >= oagcount;
> -	     id.agno--, new -= id.agsize) {
> +	     id.agno--, delta -= id.agsize) {
>  
>  		if (id.agno == nagcount - 1)
>  			id.agsize = nb -
> @@ -110,8 +110,8 @@ xfs_growfs_data_private(
>  	xfs_trans_agblocks_delta(tp, id.nfree);
>  
>  	/* If there are new blocks in the old last AG, extend it. */
> -	if (new) {
> -		error = xfs_ag_extend_space(mp, tp, &id, new);
> +	if (delta) {
> +		error = xfs_ag_extend_space(mp, tp, &id, delta);
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> @@ -143,7 +143,7 @@ xfs_growfs_data_private(
>  	 * If we expanded the last AG, free the per-AG reservation
>  	 * so we can reinitialize it with the new size.
>  	 */
> -	if (new) {
> +	if (delta) {
>  		struct xfs_perag	*pag;
>  
>  		pag = xfs_perag_get(mp, id.agno);
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t
  2021-01-26 12:56 ` [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t Gao Xiang
@ 2021-02-02 19:37   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2021-02-02 19:37 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Eric Sandeen

On Tue, Jan 26, 2021 at 08:56:16PM +0800, Gao Xiang wrote:
> Such usage isn't encouraged by the kernel coding style. Leave the
> definitions alone in case of userspace users.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fsops.c | 12 ++++++------
>  fs/xfs/xfs_fsops.h |  4 ++--
>  fs/xfs/xfs_ioctl.c |  4 ++--
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 62600d78bbf1..a2a407039227 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -25,8 +25,8 @@
>   */
>  static int
>  xfs_growfs_data_private(
> -	xfs_mount_t		*mp,		/* mount point for filesystem */
> -	xfs_growfs_data_t	*in)		/* growfs data input struct */
> +	struct xfs_mount	*mp,		/* mount point for filesystem */
> +	struct xfs_growfs_data	*in)		/* growfs data input struct */
>  {
>  	struct xfs_buf		*bp;
>  	int			error;
> @@ -35,7 +35,7 @@ xfs_growfs_data_private(
>  	xfs_rfsblock_t		nb, nb_div, nb_mod;
>  	xfs_rfsblock_t		delta;
>  	xfs_agnumber_t		oagcount;
> -	xfs_trans_t		*tp;
> +	struct xfs_trans	*tp;
>  	struct aghdr_init_data	id = {};
>  
>  	nb = in->newblocks;
> @@ -170,8 +170,8 @@ xfs_growfs_data_private(
>  
>  static int
>  xfs_growfs_log_private(
> -	xfs_mount_t		*mp,	/* mount point for filesystem */
> -	xfs_growfs_log_t	*in)	/* growfs log input struct */
> +	struct xfs_mount	*mp,	/* mount point for filesystem */
> +	struct xfs_growfs_log	*in)	/* growfs log input struct */
>  {
>  	xfs_extlen_t		nb;
>  
> @@ -268,7 +268,7 @@ xfs_growfs_data(
>  int
>  xfs_growfs_log(
>  	xfs_mount_t		*mp,
> -	xfs_growfs_log_t	*in)
> +	struct xfs_growfs_log	*in)
>  {
>  	int error;
>  
> diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> index 92869f6ec8d3..2cffe51a31e8 100644
> --- a/fs/xfs/xfs_fsops.h
> +++ b/fs/xfs/xfs_fsops.h
> @@ -6,8 +6,8 @@
>  #ifndef __XFS_FSOPS_H__
>  #define	__XFS_FSOPS_H__
>  
> -extern int xfs_growfs_data(xfs_mount_t *mp, xfs_growfs_data_t *in);
> -extern int xfs_growfs_log(xfs_mount_t *mp, xfs_growfs_log_t *in);
> +extern int xfs_growfs_data(struct xfs_mount *mp, struct xfs_growfs_data *in);
> +extern int xfs_growfs_log(struct xfs_mount *mp, struct xfs_growfs_log *in);
>  extern void xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
>  extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
>  				xfs_fsop_resblks_t *outval);
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 3fbd98f61ea5..a62520f49ec5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -2260,7 +2260,7 @@ xfs_file_ioctl(
>  	}
>  
>  	case XFS_IOC_FSGROWFSDATA: {
> -		xfs_growfs_data_t in;
> +		struct xfs_growfs_data in;
>  
>  		if (copy_from_user(&in, arg, sizeof(in)))
>  			return -EFAULT;
> @@ -2274,7 +2274,7 @@ xfs_file_ioctl(
>  	}
>  
>  	case XFS_IOC_FSGROWFSLOG: {
> -		xfs_growfs_log_t in;
> +		struct xfs_growfs_log in;
>  
>  		if (copy_from_user(&in, arg, sizeof(in)))
>  			return -EFAULT;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs
  2021-01-26 12:56 ` [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs Gao Xiang
@ 2021-02-02 19:38   ` Brian Foster
  2021-02-03  0:45     ` Gao Xiang
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-02 19:38 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Tue, Jan 26, 2021 at 08:56:17PM +0800, Gao Xiang wrote:
> sb_fdblocks will be updated lazily if lazysbcount is enabled,
> therefore when shrinking the filesystem sb_fdblocks could be
> larger than sb_dblocks and xfs_validate_sb_write() would fail.
> 
> Even for growfs case, it'd be better to update lazy sb counters
> immediately to reflect the real sb counters.
> 
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index a2a407039227..2e490fb75832 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -128,6 +128,14 @@ xfs_growfs_data_private(
>  				 nb - mp->m_sb.sb_dblocks);
>  	if (id.nfree)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> +
> +	/*
> +	 * update in-core counters now to reflect the real numbers
> +	 * (especially sb_fdblocks)
> +	 */

Could you update the comment to explain why we do this? For example:

"Sync sb counters now to reflect the updated values. This is
particularly important for shrink because the write verifier will fail
if sb_fdblocks is ever larger than sb_dblocks."

Brian

> +	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> +		xfs_log_sb(tp);
> +
>  	xfs_trans_set_sync(tp);
>  	error = xfs_trans_commit(tp);
>  	if (error)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags()
  2021-01-26 12:56 ` [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags() Gao Xiang
@ 2021-02-02 19:38   ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2021-02-02 19:38 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Tue, Jan 26, 2021 at 08:56:18PM +0800, Gao Xiang wrote:
> Move out related logic for initializing new added AGs to a new helper
> in preparation for shrinking. No logic changes.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fsops.c | 74 +++++++++++++++++++++++++++-------------------
>  1 file changed, 44 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 2e490fb75832..6c4ab5e31054 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -20,6 +20,49 @@
>  #include "xfs_ag.h"
>  #include "xfs_ag_resv.h"
>  
> +/*
> + * Write new AG headers to disk. Non-transactional, but need to be
> + * written and completed prior to the growfs transaction being logged.
> + * To do this, we use a delayed write buffer list and wait for
> + * submission and IO completion of the list as a whole. This allows the
> + * IO subsystem to merge all the AG headers in a single AG into a single
> + * IO and hide most of the latency of the IO from us.
> + *
> + * This also means that if we get an error whilst building the buffer
> + * list to write, we can cancel the entire list without having written
> + * anything.
> + */
> +static int
> +xfs_resizefs_init_new_ags(
> +	struct xfs_mount	*mp,
> +	struct aghdr_init_data	*id,
> +	xfs_agnumber_t		oagcount,
> +	xfs_agnumber_t		nagcount,
> +	xfs_rfsblock_t		*delta)
> +{
> +	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> +	int			error;
> +
> +	INIT_LIST_HEAD(&id->buffer_list);
> +	for (id->agno = nagcount - 1;
> +	     id->agno >= oagcount;
> +	     id->agno--, *delta -= id->agsize) {
> +
> +		if (id->agno == nagcount - 1)
> +			id->agsize = nb - (id->agno *
> +					(xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> +		else
> +			id->agsize = mp->m_sb.sb_agblocks;
> +
> +		error = xfs_ag_init_headers(mp, id);
> +		if (error) {
> +			xfs_buf_delwri_cancel(&id->buffer_list);
> +			return error;
> +		}
> +	}
> +	return xfs_buf_delwri_submit(&id->buffer_list);
> +}
> +
>  /*
>   * growfs operations
>   */
> @@ -74,36 +117,7 @@ xfs_growfs_data_private(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Write new AG headers to disk. Non-transactional, but need to be
> -	 * written and completed prior to the growfs transaction being logged.
> -	 * To do this, we use a delayed write buffer list and wait for
> -	 * submission and IO completion of the list as a whole. This allows the
> -	 * IO subsystem to merge all the AG headers in a single AG into a single
> -	 * IO and hide most of the latency of the IO from us.
> -	 *
> -	 * This also means that if we get an error whilst building the buffer
> -	 * list to write, we can cancel the entire list without having written
> -	 * anything.
> -	 */
> -	INIT_LIST_HEAD(&id.buffer_list);
> -	for (id.agno = nagcount - 1;
> -	     id.agno >= oagcount;
> -	     id.agno--, delta -= id.agsize) {
> -
> -		if (id.agno == nagcount - 1)
> -			id.agsize = nb -
> -				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> -		else
> -			id.agsize = mp->m_sb.sb_agblocks;
> -
> -		error = xfs_ag_init_headers(mp, &id);
> -		if (error) {
> -			xfs_buf_delwri_cancel(&id.buffer_list);
> -			goto out_trans_cancel;
> -		}
> -	}
> -	error = xfs_buf_delwri_submit(&id.buffer_list);
> +	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
>  	if (error)
>  		goto out_trans_cancel;
>  
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs
  2021-02-02 19:38   ` Brian Foster
@ 2021-02-03  0:45     ` Gao Xiang
  0 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-03  0:45 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

Hi Brian,

On Tue, Feb 02, 2021 at 02:38:04PM -0500, Brian Foster wrote:
> On Tue, Jan 26, 2021 at 08:56:17PM +0800, Gao Xiang wrote:
> > sb_fdblocks will be updated lazily if lazysbcount is enabled,
> > therefore when shrinking the filesystem sb_fdblocks could be
> > larger than sb_dblocks and xfs_validate_sb_write() would fail.
> > 
> > Even for growfs case, it'd be better to update lazy sb counters
> > immediately to reflect the real sb counters.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> >  fs/xfs/xfs_fsops.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index a2a407039227..2e490fb75832 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -128,6 +128,14 @@ xfs_growfs_data_private(
> >  				 nb - mp->m_sb.sb_dblocks);
> >  	if (id.nfree)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > +
> > +	/*
> > +	 * update in-core counters now to reflect the real numbers
> > +	 * (especially sb_fdblocks)
> > +	 */
> 
> Could you update the comment to explain why we do this? For example:
> 
> "Sync sb counters now to reflect the updated values. This is
> particularly important for shrink because the write verifier will fail
> if sb_fdblocks is ever larger than sb_dblocks."
> 

Thanks for the review/suggestion!

I updated the comment in "[PATCH v6 6/7] xfs: support shrinking unused
space in the last AG", since shrinking functionality is somewhat landed
after [PATCH 6/7]... If that looks worse than changing directly here,
I will shift/update the comment in the next version.

Thanks,
Gao Xiang

> Brian
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-01-26 12:56 ` [PATCH v6 6/7] xfs: support shrinking unused space in the last AG Gao Xiang
@ 2021-02-03 14:23   ` Brian Foster
  2021-02-03 14:51     ` Gao Xiang
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-03 14:23 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> As the first step of shrinking, this attempts to enable shrinking
> unused space in the last allocation group by fixing up freespace
> btree, agi, agf and adjusting super block and use a helper
> xfs_ag_shrink_space() to fixup the last AG.
> 
> This can be all done in one transaction for now, so I think no
> additional protection is needed.
> 
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
>  fs/xfs/xfs_trans.c |  1 -
>  2 files changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 6c4ab5e31054..4bcea22f7b3f 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
>  	struct aghdr_init_data	*id,
>  	xfs_agnumber_t		oagcount,
>  	xfs_agnumber_t		nagcount,
> -	xfs_rfsblock_t		*delta)
> +	int64_t			*delta)
>  {
>  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
>  	int			error;
> @@ -76,33 +76,41 @@ xfs_growfs_data_private(
>  	xfs_agnumber_t		nagcount;
>  	xfs_agnumber_t		nagimax = 0;
>  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> -	xfs_rfsblock_t		delta;
> +	int64_t			delta;
>  	xfs_agnumber_t		oagcount;
>  	struct xfs_trans	*tp;
> +	bool			extend;
>  	struct aghdr_init_data	id = {};
>  
>  	nb = in->newblocks;
> -	if (nb < mp->m_sb.sb_dblocks)
> -		return -EINVAL;
> -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> +	if (nb == mp->m_sb.sb_dblocks)
> +		return 0;
> +
> +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> +	if (error)
>  		return error;
> -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> +
> +	if (nb > mp->m_sb.sb_dblocks) {
> +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
>  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
>  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> -	if (error)
> -		return error;
> -	xfs_buf_relse(bp);
> +		if (error)
> +			return error;
> +		xfs_buf_relse(bp);
> +	}
>  
>  	nb_div = nb;
>  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
>  	nagcount = nb_div + (nb_mod != 0);
>  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
>  		nagcount--;
> -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> -		if (nb < mp->m_sb.sb_dblocks)
> +		if (nagcount < 2)
>  			return -EINVAL;

What's the reason for the nagcount < 2 check? IIRC we warn about this
configuration at mkfs time, but allow it to proceed. Is it just that we
don't want to accidentally put the fs into an agcount == 1 state that
was originally formatted with >1 AGs?

What about the case where we attempt to grow an agcount == 1 fs but
don't enlarge enough to add the second AG? Does this change error
behavior in that case?

> +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
>  	}
> +
>  	delta = nb - mp->m_sb.sb_dblocks;
> +	extend = (delta > 0);
>  	oagcount = mp->m_sb.sb_agcount;
>  
>  	/* allocate the new per-ag structures */
> @@ -110,22 +118,34 @@ xfs_growfs_data_private(
>  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
>  		if (error)
>  			return error;
> +	} else if (nagcount < oagcount) {
> +		/* TODO: shrinking the entire AGs hasn't yet completed */
> +		return -EINVAL;
>  	}
>  
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> +			XFS_TRANS_RESERVE, &tp);
>  	if (error)
>  		return error;
>  
> -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> -	if (error)
> -		goto out_trans_cancel;
> -
> +	if (extend) {
> +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> +						  nagcount, &delta);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
>  	xfs_trans_agblocks_delta(tp, id.nfree);

It looks like id isn't used until the resize call above. Is this call
relevant for the shrink case?

>  
> -	/* If there are new blocks in the old last AG, extend it. */
> +	/* If there are some blocks in the last AG, resize it. */
>  	if (delta) {

This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
of the function. Should we ever get to this point with delta == 0? (If
not, maybe convert it to an assert just to be safe.)

> -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> +		if (extend) {
> +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> +		} else {
> +			id.agno = nagcount - 1;
> +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);

xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
pass in agno for now..?

> +		}
> +
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> @@ -137,15 +157,15 @@ xfs_growfs_data_private(
>  	 */
>  	if (nagcount > oagcount)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> -	if (nb > mp->m_sb.sb_dblocks)
> +	if (nb != mp->m_sb.sb_dblocks)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
>  				 nb - mp->m_sb.sb_dblocks);

Maybe use delta here?

>  	if (id.nfree)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
>  

id.nfree tracks newly added free space in the growfs space. Is it not
used in the shrink case because the allocation handles this for us?

>  	/*
> -	 * update in-core counters now to reflect the real numbers
> -	 * (especially sb_fdblocks)
> +	 * update in-core counters now to reflect the real numbers (especially
> +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
>  	 */
>  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
>  		xfs_log_sb(tp);
> @@ -165,7 +185,7 @@ xfs_growfs_data_private(
>  	 * If we expanded the last AG, free the per-AG reservation
>  	 * so we can reinitialize it with the new size.
>  	 */
> -	if (delta) {
> +	if (extend && delta) {
>  		struct xfs_perag	*pag;
>  
>  		pag = xfs_perag_get(mp, id.agno);

We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
this function. xfs_ag_shrink_space() from the previous patch is intended
to deal with perag reservation changes for shrink, but it looks like the
reserve call further down could potentially reset mp->m_finobt_nores to
false if it previously might have been set to true.

Brian

> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index e72730f85af1..fd2cbf414b80 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
>  		tp->t_res_frextents_delta += delta;
>  		break;
>  	case XFS_TRANS_SB_DBLOCKS:
> -		ASSERT(delta > 0);
>  		tp->t_dblocks_delta += delta;
>  		break;
>  	case XFS_TRANS_SB_AGCOUNT:
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs
  2021-01-26 12:56 ` [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs Gao Xiang
@ 2021-02-03 14:23   ` Brian Foster
  2021-02-03 15:01     ` Gao Xiang
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-03 14:23 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Darrick J . Wong

On Tue, Jan 26, 2021 at 08:56:21PM +0800, Gao Xiang wrote:
> per-AG resv failure after fixing up freespace is hard to test in an
> effective way, so directly add an error injection path to observe
> such error handling path works as expected.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c       | 5 +++++
>  fs/xfs/libxfs/xfs_errortag.h | 4 +++-
>  fs/xfs/xfs_error.c           | 2 ++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index c6e68e265269..5076913c153f 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -23,6 +23,7 @@
>  #include "xfs_ag_resv.h"
>  #include "xfs_health.h"
>  #include "xfs_error.h"
> +#include "xfs_errortag.h"
>  #include "xfs_bmap.h"
>  #include "xfs_defer.h"
>  #include "xfs_log_format.h"
> @@ -559,6 +560,10 @@ xfs_ag_shrink_space(
>  	be32_add_cpu(&agf->agf_length, -len);
>  
>  	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
> +
> +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL))
> +		err2 = -ENOSPC;
> +

Seems reasonable, but I feel like this could be broadened to serve as a
generic perag reservation error tag. I suppose we might not be able to
use it on a clean mount, but perhaps it could be reused for growfs and
remount. Hm?

Brian

>  	if (err2) {
>  		be32_add_cpu(&agi->agi_length, len);
>  		be32_add_cpu(&agf->agf_length, len);
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 6ca9084b6934..5fd71a930b68 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -40,6 +40,7 @@
>  #define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
>  #define XFS_ERRTAG_BMAP_FINISH_ONE			26
>  #define XFS_ERRTAG_AG_RESV_CRITICAL			27
> +
>  /*
>   * DEBUG mode instrumentation to test and/or trigger delayed allocation
>   * block killing in the event of failed writes. When enabled, all
> @@ -58,7 +59,8 @@
>  #define XFS_ERRTAG_BUF_IOERROR				35
>  #define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
>  #define XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT		37
> -#define XFS_ERRTAG_MAX					38
> +#define XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL		38
> +#define XFS_ERRTAG_MAX					39
>  
>  /*
>   * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index 185b4915b7bf..7bae34bfddd2 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -168,6 +168,7 @@ XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
>  XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
>  XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
>  XFS_ERRORTAG_ATTR_RW(bmap_alloc_minlen_extent,	XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT);
> +XFS_ERRORTAG_ATTR_RW(shrinkfs_ag_resv_fail, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL);
>  
>  static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(noerror),
> @@ -208,6 +209,7 @@ static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
>  	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
>  	XFS_ERRORTAG_ATTR_LIST(bmap_alloc_minlen_extent),
> +	XFS_ERRORTAG_ATTR_LIST(shrinkfs_ag_resv_fail),
>  	NULL,
>  };
>  
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 14:23   ` Brian Foster
@ 2021-02-03 14:51     ` Gao Xiang
  2021-02-03 18:01       ` Brian Foster
  2021-02-03 18:12       ` Darrick J. Wong
  0 siblings, 2 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-03 14:51 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

Hi Brian,

On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
> On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> > As the first step of shrinking, this attempts to enable shrinking
> > unused space in the last allocation group by fixing up freespace
> > btree, agi, agf and adjusting super block and use a helper
> > xfs_ag_shrink_space() to fixup the last AG.
> > 
> > This can be all done in one transaction for now, so I think no
> > additional protection is needed.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> >  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
> >  fs/xfs/xfs_trans.c |  1 -
> >  2 files changed, 42 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 6c4ab5e31054..4bcea22f7b3f 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
> >  	struct aghdr_init_data	*id,
> >  	xfs_agnumber_t		oagcount,
> >  	xfs_agnumber_t		nagcount,
> > -	xfs_rfsblock_t		*delta)
> > +	int64_t			*delta)
> >  {
> >  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> >  	int			error;
> > @@ -76,33 +76,41 @@ xfs_growfs_data_private(
> >  	xfs_agnumber_t		nagcount;
> >  	xfs_agnumber_t		nagimax = 0;
> >  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> > -	xfs_rfsblock_t		delta;
> > +	int64_t			delta;
> >  	xfs_agnumber_t		oagcount;
> >  	struct xfs_trans	*tp;
> > +	bool			extend;
> >  	struct aghdr_init_data	id = {};
> >  
> >  	nb = in->newblocks;
> > -	if (nb < mp->m_sb.sb_dblocks)
> > -		return -EINVAL;
> > -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> > +	if (nb == mp->m_sb.sb_dblocks)
> > +		return 0;
> > +
> > +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> > +	if (error)
> >  		return error;
> > -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > +
> > +	if (nb > mp->m_sb.sb_dblocks) {
> > +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
> >  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
> >  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> > -	if (error)
> > -		return error;
> > -	xfs_buf_relse(bp);
> > +		if (error)
> > +			return error;
> > +		xfs_buf_relse(bp);
> > +	}
> >  
> >  	nb_div = nb;
> >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> >  	nagcount = nb_div + (nb_mod != 0);
> >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> >  		nagcount--;
> > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > -		if (nb < mp->m_sb.sb_dblocks)
> > +		if (nagcount < 2)
> >  			return -EINVAL;
> 
> What's the reason for the nagcount < 2 check? IIRC we warn about this
> configuration at mkfs time, but allow it to proceed. Is it just that we
> don't want to accidentally put the fs into an agcount == 1 state that
> was originally formatted with >1 AGs?

Darrick once asked for avoiding shrinking the filesystem which has
only 1 AG.

> 
> What about the case where we attempt to grow an agcount == 1 fs but
> don't enlarge enough to add the second AG? Does this change error
> behavior in that case?

Yeah, thanks for catching this! If growfs allows 1 AG case before,
I think it needs to be refined. Let me update this in the next version!

> 
> > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> >  	}
> > +
> >  	delta = nb - mp->m_sb.sb_dblocks;
> > +	extend = (delta > 0);
> >  	oagcount = mp->m_sb.sb_agcount;
> >  
> >  	/* allocate the new per-ag structures */
> > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> >  		if (error)
> >  			return error;
> > +	} else if (nagcount < oagcount) {
> > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > +		return -EINVAL;
> >  	}
> >  
> >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > +			XFS_TRANS_RESERVE, &tp);
> >  	if (error)
> >  		return error;
> >  
> > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > -	if (error)
> > -		goto out_trans_cancel;
> > -
> > +	if (extend) {
> > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > +						  nagcount, &delta);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> >  	xfs_trans_agblocks_delta(tp, id.nfree);
> 
> It looks like id isn't used until the resize call above. Is this call
> relevant for the shrink case?

I think it has nothing to do for the shrink the last AG case as well
(id.nfree == 0 here) but maybe use for the later shrinking the whole
AGs patchset. I can move into if (extend) in the next version.

> 
> >  
> > -	/* If there are new blocks in the old last AG, extend it. */
> > +	/* If there are some blocks in the last AG, resize it. */
> >  	if (delta) {
> 
> This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> of the function. Should we ever get to this point with delta == 0? (If
> not, maybe convert it to an assert just to be safe.)

delta would be changed after xfs_resizefs_init_new_ags() (the original
growfs design is that, I don't want to touch the original logic). that
is why `delta' reflects the last AG delta now...

> 
> > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > +		if (extend) {
> > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > +		} else {
> > +			id.agno = nagcount - 1;
> > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> 
> xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> pass in agno for now..?

Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
it seems more natural to pass in &id rather than agno (since
id.agno = nagcount - 1 will be stayed in some new helper
e.g. xfs_shrink_ags())

> 
> > +		}
> > +
> >  		if (error)
> >  			goto out_trans_cancel;
> >  	}
> > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> >  	 */
> >  	if (nagcount > oagcount)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > -	if (nb > mp->m_sb.sb_dblocks)
> > +	if (nb != mp->m_sb.sb_dblocks)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> >  				 nb - mp->m_sb.sb_dblocks);
> 
> Maybe use delta here?

The reason is the same as above, `delta' here was changed due to 
xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
anymore. so `extend` boolean is used (rather than just use delta > 0)

> 
> >  	if (id.nfree)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> >  
> 
> id.nfree tracks newly added free space in the growfs space. Is it not
> used in the shrink case because the allocation handles this for us?

Yeah, I'm afraid so. This is some common code, and also used in my
shrinking the whole AGs patchset.

> 
> >  	/*
> > -	 * update in-core counters now to reflect the real numbers
> > -	 * (especially sb_fdblocks)
> > +	 * update in-core counters now to reflect the real numbers (especially
> > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> >  	 */
> >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> >  		xfs_log_sb(tp);
> > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> >  	 * If we expanded the last AG, free the per-AG reservation
> >  	 * so we can reinitialize it with the new size.
> >  	 */
> > -	if (delta) {
> > +	if (extend && delta) {
> >  		struct xfs_perag	*pag;
> >  
> >  		pag = xfs_perag_get(mp, id.agno);
> 
> We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> this function. xfs_ag_shrink_space() from the previous patch is intended
> to deal with perag reservation changes for shrink, but it looks like the
> reserve call further down could potentially reset mp->m_finobt_nores to
> false if it previously might have been set to true.

Yeah, if my understanding is correct, I might need to call
xfs_fs_reserve_ag_blocks() only for growfs case as well for
mp->m_finobt_nores = true case.

Thanks,
Gao Xiang

> 
> Brian
> 
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index e72730f85af1..fd2cbf414b80 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
> >  		tp->t_res_frextents_delta += delta;
> >  		break;
> >  	case XFS_TRANS_SB_DBLOCKS:
> > -		ASSERT(delta > 0);
> >  		tp->t_dblocks_delta += delta;
> >  		break;
> >  	case XFS_TRANS_SB_AGCOUNT:
> > -- 
> > 2.27.0
> > 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs
  2021-02-03 14:23   ` Brian Foster
@ 2021-02-03 15:01     ` Gao Xiang
  2021-02-03 18:01       ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-02-03 15:01 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Darrick J . Wong

Hi Brian,

On Wed, Feb 03, 2021 at 09:23:59AM -0500, Brian Foster wrote:
> On Tue, Jan 26, 2021 at 08:56:21PM +0800, Gao Xiang wrote:
> > per-AG resv failure after fixing up freespace is hard to test in an
> > effective way, so directly add an error injection path to observe
> > such error handling path works as expected.
> > 
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_ag.c       | 5 +++++
> >  fs/xfs/libxfs/xfs_errortag.h | 4 +++-
> >  fs/xfs/xfs_error.c           | 2 ++
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index c6e68e265269..5076913c153f 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -23,6 +23,7 @@
> >  #include "xfs_ag_resv.h"
> >  #include "xfs_health.h"
> >  #include "xfs_error.h"
> > +#include "xfs_errortag.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_defer.h"
> >  #include "xfs_log_format.h"
> > @@ -559,6 +560,10 @@ xfs_ag_shrink_space(
> >  	be32_add_cpu(&agf->agf_length, -len);
> >  
> >  	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
> > +
> > +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL))
> > +		err2 = -ENOSPC;
> > +
> 
> Seems reasonable, but I feel like this could be broadened to serve as a
> generic perag reservation error tag. I suppose we might not be able to
> use it on a clean mount, but perhaps it could be reused for growfs and
> remount. Hm?

I think it could be done in that way, yet currently the logic is just to
verify the shrink error handling case above rather than extend to actually
error inject per-AG reservation for now... I could rename the errortag
for later reuse (some better naming? I'm not good at this...) in advance
yet real per-AG reservation error injection might be more complicated
than just error out with -ENOSPC, and it's somewhat out of scope of this
patchset for now...

Thanks,
Gao Xiang

> 
> Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 14:51     ` Gao Xiang
@ 2021-02-03 18:01       ` Brian Foster
  2021-02-04  9:18         ` Gao Xiang
  2021-02-03 18:12       ` Darrick J. Wong
  1 sibling, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-03 18:01 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> Hi Brian,
> 
> On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
> > On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> > > As the first step of shrinking, this attempts to enable shrinking
> > > unused space in the last allocation group by fixing up freespace
> > > btree, agi, agf and adjusting super block and use a helper
> > > xfs_ag_shrink_space() to fixup the last AG.
> > > 
> > > This can be all done in one transaction for now, so I think no
> > > additional protection is needed.
> > > 
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > >  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
> > >  fs/xfs/xfs_trans.c |  1 -
> > >  2 files changed, 42 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 6c4ab5e31054..4bcea22f7b3f 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > > @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
> > >  	struct aghdr_init_data	*id,
> > >  	xfs_agnumber_t		oagcount,
> > >  	xfs_agnumber_t		nagcount,
> > > -	xfs_rfsblock_t		*delta)
> > > +	int64_t			*delta)
> > >  {
> > >  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> > >  	int			error;
> > > @@ -76,33 +76,41 @@ xfs_growfs_data_private(
> > >  	xfs_agnumber_t		nagcount;
> > >  	xfs_agnumber_t		nagimax = 0;
> > >  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> > > -	xfs_rfsblock_t		delta;
> > > +	int64_t			delta;
> > >  	xfs_agnumber_t		oagcount;
> > >  	struct xfs_trans	*tp;
> > > +	bool			extend;
> > >  	struct aghdr_init_data	id = {};
> > >  
> > >  	nb = in->newblocks;
> > > -	if (nb < mp->m_sb.sb_dblocks)
> > > -		return -EINVAL;
> > > -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> > > +	if (nb == mp->m_sb.sb_dblocks)
> > > +		return 0;
> > > +
> > > +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> > > +	if (error)
> > >  		return error;
> > > -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > +
> > > +	if (nb > mp->m_sb.sb_dblocks) {
> > > +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > >  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
> > >  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> > > -	if (error)
> > > -		return error;
> > > -	xfs_buf_relse(bp);
> > > +		if (error)
> > > +			return error;
> > > +		xfs_buf_relse(bp);
> > > +	}
> > >  
> > >  	nb_div = nb;
> > >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> > >  	nagcount = nb_div + (nb_mod != 0);
> > >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> > >  		nagcount--;
> > > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > -		if (nb < mp->m_sb.sb_dblocks)
> > > +		if (nagcount < 2)
> > >  			return -EINVAL;
> > 
> > What's the reason for the nagcount < 2 check? IIRC we warn about this
> > configuration at mkfs time, but allow it to proceed. Is it just that we
> > don't want to accidentally put the fs into an agcount == 1 state that
> > was originally formatted with >1 AGs?
> 
> Darrick once asked for avoiding shrinking the filesystem which has
> only 1 AG.
> 
> > 
> > What about the case where we attempt to grow an agcount == 1 fs but
> > don't enlarge enough to add the second AG? Does this change error
> > behavior in that case?
> 
> Yeah, thanks for catching this! If growfs allows 1 AG case before,
> I think it needs to be refined. Let me update this in the next version!
> 
> > 
> > > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > >  	}
> > > +
> > >  	delta = nb - mp->m_sb.sb_dblocks;
> > > +	extend = (delta > 0);
> > >  	oagcount = mp->m_sb.sb_agcount;
> > >  
> > >  	/* allocate the new per-ag structures */
> > > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> > >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> > >  		if (error)
> > >  			return error;
> > > +	} else if (nagcount < oagcount) {
> > > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > > +		return -EINVAL;
> > >  	}
> > >  
> > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > > +			XFS_TRANS_RESERVE, &tp);
> > >  	if (error)
> > >  		return error;
> > >  
> > > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > > -	if (error)
> > > -		goto out_trans_cancel;
> > > -
> > > +	if (extend) {
> > > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > > +						  nagcount, &delta);
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > > +	}
> > >  	xfs_trans_agblocks_delta(tp, id.nfree);
> > 
> > It looks like id isn't used until the resize call above. Is this call
> > relevant for the shrink case?
> 
> I think it has nothing to do for the shrink the last AG case as well
> (id.nfree == 0 here) but maybe use for the later shrinking the whole
> AGs patchset. I can move into if (extend) in the next version.
> 
> > 
> > >  
> > > -	/* If there are new blocks in the old last AG, extend it. */
> > > +	/* If there are some blocks in the last AG, resize it. */
> > >  	if (delta) {
> > 
> > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > of the function. Should we ever get to this point with delta == 0? (If
> > not, maybe convert it to an assert just to be safe.)
> 
> delta would be changed after xfs_resizefs_init_new_ags() (the original
> growfs design is that, I don't want to touch the original logic). that
> is why `delta' reflects the last AG delta now...
> 

Oh, I see. Hmm... that's a bit obfuscated and easy to miss. Perhaps the
new helper should also include the extend_space() call below to do all
of the AG updates in one place. It's not clear to me if we need to keep
the growfs perag reservation code where it is. If so, the new helper
could take a boolean pointer (instead of delta) that it can set to true
if it had to extend the size of the old last AG because the perag res
bits don't actually use the delta value. IOW, I think this hunk could
look something like the following:

	bool	resetagres = false;

	if (extend)
		error = xfs_resizefs_init_new_ags(..., delta, &resetagres);
	else
		error = xfs_ag_shrink_space(... -delta);
	...

	if (resetagres) {
		<do perag res fixups>
	}
	...

Hm?

Brian

> > 
> > > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > +		if (extend) {
> > > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > +		} else {
> > > +			id.agno = nagcount - 1;
> > > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> > 
> > xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> > pass in agno for now..?
> 
> Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
> it seems more natural to pass in &id rather than agno (since
> id.agno = nagcount - 1 will be stayed in some new helper
> e.g. xfs_shrink_ags())
> 
> > 
> > > +		}
> > > +
> > >  		if (error)
> > >  			goto out_trans_cancel;
> > >  	}
> > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > >  	 */
> > >  	if (nagcount > oagcount)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > -	if (nb > mp->m_sb.sb_dblocks)
> > > +	if (nb != mp->m_sb.sb_dblocks)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > >  				 nb - mp->m_sb.sb_dblocks);
> > 
> > Maybe use delta here?
> 
> The reason is the same as above, `delta' here was changed due to 
> xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> anymore. so `extend` boolean is used (rather than just use delta > 0)
> 
> > 
> > >  	if (id.nfree)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > >  
> > 
> > id.nfree tracks newly added free space in the growfs space. Is it not
> > used in the shrink case because the allocation handles this for us?
> 
> Yeah, I'm afraid so. This is some common code, and also used in my
> shrinking the whole AGs patchset.
> 
> > 
> > >  	/*
> > > -	 * update in-core counters now to reflect the real numbers
> > > -	 * (especially sb_fdblocks)
> > > +	 * update in-core counters now to reflect the real numbers (especially
> > > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> > >  	 */
> > >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> > >  		xfs_log_sb(tp);
> > > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> > >  	 * If we expanded the last AG, free the per-AG reservation
> > >  	 * so we can reinitialize it with the new size.
> > >  	 */
> > > -	if (delta) {
> > > +	if (extend && delta) {
> > >  		struct xfs_perag	*pag;
> > >  
> > >  		pag = xfs_perag_get(mp, id.agno);
> > 
> > We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> > this function. xfs_ag_shrink_space() from the previous patch is intended
> > to deal with perag reservation changes for shrink, but it looks like the
> > reserve call further down could potentially reset mp->m_finobt_nores to
> > false if it previously might have been set to true.
> 
> Yeah, if my understanding is correct, I might need to call
> xfs_fs_reserve_ag_blocks() only for growfs case as well for
> mp->m_finobt_nores = true case.
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Brian
> > 
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index e72730f85af1..fd2cbf414b80 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
> > >  		tp->t_res_frextents_delta += delta;
> > >  		break;
> > >  	case XFS_TRANS_SB_DBLOCKS:
> > > -		ASSERT(delta > 0);
> > >  		tp->t_dblocks_delta += delta;
> > >  		break;
> > >  	case XFS_TRANS_SB_AGCOUNT:
> > > -- 
> > > 2.27.0
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs
  2021-02-03 15:01     ` Gao Xiang
@ 2021-02-03 18:01       ` Brian Foster
  2021-02-04  9:20         ` Gao Xiang
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-03 18:01 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Darrick J . Wong

On Wed, Feb 03, 2021 at 11:01:32PM +0800, Gao Xiang wrote:
> Hi Brian,
> 
> On Wed, Feb 03, 2021 at 09:23:59AM -0500, Brian Foster wrote:
> > On Tue, Jan 26, 2021 at 08:56:21PM +0800, Gao Xiang wrote:
> > > per-AG resv failure after fixing up freespace is hard to test in an
> > > effective way, so directly add an error injection path to observe
> > > such error handling path works as expected.
> > > 
> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_ag.c       | 5 +++++
> > >  fs/xfs/libxfs/xfs_errortag.h | 4 +++-
> > >  fs/xfs/xfs_error.c           | 2 ++
> > >  3 files changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > > index c6e68e265269..5076913c153f 100644
> > > --- a/fs/xfs/libxfs/xfs_ag.c
> > > +++ b/fs/xfs/libxfs/xfs_ag.c
> > > @@ -23,6 +23,7 @@
> > >  #include "xfs_ag_resv.h"
> > >  #include "xfs_health.h"
> > >  #include "xfs_error.h"
> > > +#include "xfs_errortag.h"
> > >  #include "xfs_bmap.h"
> > >  #include "xfs_defer.h"
> > >  #include "xfs_log_format.h"
> > > @@ -559,6 +560,10 @@ xfs_ag_shrink_space(
> > >  	be32_add_cpu(&agf->agf_length, -len);
> > >  
> > >  	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
> > > +
> > > +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL))
> > > +		err2 = -ENOSPC;
> > > +
> > 
> > Seems reasonable, but I feel like this could be broadened to serve as a
> > generic perag reservation error tag. I suppose we might not be able to
> > use it on a clean mount, but perhaps it could be reused for growfs and
> > remount. Hm?
> 
> I think it could be done in that way, yet currently the logic is just to
> verify the shrink error handling case above rather than extend to actually
> error inject per-AG reservation for now... I could rename the errortag
> for later reuse (some better naming? I'm not good at this...) in advance
> yet real per-AG reservation error injection might be more complicated
> than just error out with -ENOSPC, and it's somewhat out of scope of this
> patchset for now...
> 

I don't think it needs to be any more complicated than the logic you
have here. Just bury it further down in in the perag res init code,
rename it to something like ERRTAG_AG_RESV_FAIL, and use it the exact
same way for shrink testing. For example, maybe drop it into
__xfs_ag_resv_init() near the xfs_mod_fdblocks() call so we can also
take advantage of the tracepoint that triggers on -ENOSPC for
informational purposes:

	error = xfs_mod_fdblocks(...);
	if (!error && XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL))
		error = -ENOSPC;
	if (error) {
		...
	}

Brian

> Thanks,
> Gao Xiang
> 
> > 
> > Brian
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 14:51     ` Gao Xiang
  2021-02-03 18:01       ` Brian Foster
@ 2021-02-03 18:12       ` Darrick J. Wong
  2021-02-03 18:14         ` Darrick J. Wong
                           ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-02-03 18:12 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Brian Foster, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> Hi Brian,
> 
> On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
> > On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> > > As the first step of shrinking, this attempts to enable shrinking
> > > unused space in the last allocation group by fixing up freespace
> > > btree, agi, agf and adjusting super block and use a helper
> > > xfs_ag_shrink_space() to fixup the last AG.
> > > 
> > > This can be all done in one transaction for now, so I think no
> > > additional protection is needed.
> > > 
> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > ---
> > >  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
> > >  fs/xfs/xfs_trans.c |  1 -
> > >  2 files changed, 42 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 6c4ab5e31054..4bcea22f7b3f 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > > @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
> > >  	struct aghdr_init_data	*id,
> > >  	xfs_agnumber_t		oagcount,
> > >  	xfs_agnumber_t		nagcount,
> > > -	xfs_rfsblock_t		*delta)
> > > +	int64_t			*delta)
> > >  {
> > >  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> > >  	int			error;
> > > @@ -76,33 +76,41 @@ xfs_growfs_data_private(
> > >  	xfs_agnumber_t		nagcount;
> > >  	xfs_agnumber_t		nagimax = 0;
> > >  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> > > -	xfs_rfsblock_t		delta;
> > > +	int64_t			delta;
> > >  	xfs_agnumber_t		oagcount;
> > >  	struct xfs_trans	*tp;
> > > +	bool			extend;
> > >  	struct aghdr_init_data	id = {};
> > >  
> > >  	nb = in->newblocks;
> > > -	if (nb < mp->m_sb.sb_dblocks)
> > > -		return -EINVAL;
> > > -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> > > +	if (nb == mp->m_sb.sb_dblocks)
> > > +		return 0;
> > > +
> > > +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> > > +	if (error)
> > >  		return error;
> > > -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > +
> > > +	if (nb > mp->m_sb.sb_dblocks) {
> > > +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > >  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
> > >  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> > > -	if (error)
> > > -		return error;
> > > -	xfs_buf_relse(bp);
> > > +		if (error)
> > > +			return error;
> > > +		xfs_buf_relse(bp);
> > > +	}
> > >  
> > >  	nb_div = nb;
> > >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> > >  	nagcount = nb_div + (nb_mod != 0);
> > >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> > >  		nagcount--;
> > > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > -		if (nb < mp->m_sb.sb_dblocks)
> > > +		if (nagcount < 2)
> > >  			return -EINVAL;
> > 
> > What's the reason for the nagcount < 2 check? IIRC we warn about this
> > configuration at mkfs time, but allow it to proceed. Is it just that we
> > don't want to accidentally put the fs into an agcount == 1 state that
> > was originally formatted with >1 AGs?
> 
> Darrick once asked for avoiding shrinking the filesystem which has
> only 1 AG.

It's worth mentioning why in a comment though:

	/*
	 * XFS doesn't really support single-AG filesystems, so do not
	 * permit callers to remove the filesystem's second and last AG.
	 */
	if (shrink && new_agcount < 2)
		return -EHAHANOYOUDONT;

But as Brian points out, we /do/ allow adding a second AG to a single-AG
fs.

> > 
> > What about the case where we attempt to grow an agcount == 1 fs but
> > don't enlarge enough to add the second AG? Does this change error
> > behavior in that case?
> 
> Yeah, thanks for catching this! If growfs allows 1 AG case before,
> I think it needs to be refined. Let me update this in the next version!
> 
> > 
> > > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > >  	}
> > > +
> > >  	delta = nb - mp->m_sb.sb_dblocks;
> > > +	extend = (delta > 0);
> > >  	oagcount = mp->m_sb.sb_agcount;
> > >  
> > >  	/* allocate the new per-ag structures */
> > > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> > >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> > >  		if (error)
> > >  			return error;
> > > +	} else if (nagcount < oagcount) {
> > > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > > +		return -EINVAL;
> > >  	}
> > >  
> > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > > +			XFS_TRANS_RESERVE, &tp);
> > >  	if (error)
> > >  		return error;
> > >  
> > > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > > -	if (error)
> > > -		goto out_trans_cancel;
> > > -
> > > +	if (extend) {
> > > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > > +						  nagcount, &delta);
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > > +	}
> > >  	xfs_trans_agblocks_delta(tp, id.nfree);
> > 
> > It looks like id isn't used until the resize call above. Is this call
> > relevant for the shrink case?
> 
> I think it has nothing to do for the shrink the last AG case as well
> (id.nfree == 0 here) but maybe use for the later shrinking the whole
> AGs patchset. I can move into if (extend) in the next version.
> 
> > 
> > >  
> > > -	/* If there are new blocks in the old last AG, extend it. */
> > > +	/* If there are some blocks in the last AG, resize it. */
> > >  	if (delta) {
> > 
> > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > of the function. Should we ever get to this point with delta == 0? (If
> > not, maybe convert it to an assert just to be safe.)
> 
> delta would be changed after xfs_resizefs_init_new_ags() (the original
> growfs design is that, I don't want to touch the original logic). that
> is why `delta' reflects the last AG delta now...

I've never liked how the meaning of "delta" changes through the
function, and it clearly trips up reviewers.  This variable isn't the
delta between the old dblocks and the new dblocks, it's really a
resizefs cursor that tells us how much work we still have to do.

> > 
> > > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > +		if (extend) {
> > > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > +		} else {
> > > +			id.agno = nagcount - 1;
> > > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> > 
> > xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> > pass in agno for now..?
> 
> Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
> it seems more natural to pass in &id rather than agno (since
> id.agno = nagcount - 1 will be stayed in some new helper
> e.g. xfs_shrink_ags())

@id is struct aghdr_init_data, but shrinking shouldn't initialize any AG
headers.  Are you planning to make use of it in shrink, either now or
later on?

> 
> > 
> > > +		}
> > > +
> > >  		if (error)
> > >  			goto out_trans_cancel;
> > >  	}
> > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > >  	 */
> > >  	if (nagcount > oagcount)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > -	if (nb > mp->m_sb.sb_dblocks)
> > > +	if (nb != mp->m_sb.sb_dblocks)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > >  				 nb - mp->m_sb.sb_dblocks);
> > 
> > Maybe use delta here?
> 
> The reason is the same as above, `delta' here was changed due to 
> xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> anymore. so `extend` boolean is used (rather than just use delta > 0)

Long question:

The reason why we use (nb - dblocks) is because growfs is an all or
nothing operation -- either we succeed in writing new empty AGs and
inflating the (former) last AG of the fs, or we don't do anything at
all.  We don't allow partial growing; if we did, then delta would be
relevant here.  I think we get away with not needing to run transactions
for each AG because those new AGs are inaccessible until we commit the
new agcount/dblocks, right?

In your design for the fs shrinker, do you anticipate being able to
eliminate all the eligible AGs in a single transaction?  Or do you
envision only tackling one AG at a time?  And can we be partially
successful with a shrink?  e.g. we succeed at eliminating the last AG,
but then the one before that isn't empty and so we bail out, but by that
point we did actually make the fs a little bit smaller.

There's this comment at the bottom of xfs_growfs_data() that says that
we can return error codes if the secondary sb update fails, even if the
new size is already live.  This convinces me that it's always been the
case that callers of the growfs ioctl are supposed to re-query the fs
geometry afterwards to find out if the fs size changed, even if the
ioctl itself returns an error... which implies that partial grow/shrink
are a possibility.

> 
> > 
> > >  	if (id.nfree)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > >  
> > 
> > id.nfree tracks newly added free space in the growfs space. Is it not
> > used in the shrink case because the allocation handles this for us?
> 
> Yeah, I'm afraid so. This is some common code, and also used in my
> shrinking the whole AGs patchset.
> 
> > 
> > >  	/*
> > > -	 * update in-core counters now to reflect the real numbers
> > > -	 * (especially sb_fdblocks)
> > > +	 * update in-core counters now to reflect the real numbers (especially
> > > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> > >  	 */
> > >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> > >  		xfs_log_sb(tp);
> > > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> > >  	 * If we expanded the last AG, free the per-AG reservation
> > >  	 * so we can reinitialize it with the new size.
> > >  	 */
> > > -	if (delta) {
> > > +	if (extend && delta) {
> > >  		struct xfs_perag	*pag;
> > >  
> > >  		pag = xfs_perag_get(mp, id.agno);
> > 
> > We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> > this function. xfs_ag_shrink_space() from the previous patch is intended
> > to deal with perag reservation changes for shrink, but it looks like the
> > reserve call further down could potentially reset mp->m_finobt_nores to
> > false if it previously might have been set to true.
> 
> Yeah, if my understanding is correct, I might need to call
> xfs_fs_reserve_ag_blocks() only for growfs case as well for
> mp->m_finobt_nores = true case.

I suppose it's worth trying in the finobt_nores==true case. :)

--D

> 
> Thanks,
> Gao Xiang
> 
> > 
> > Brian
> > 
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index e72730f85af1..fd2cbf414b80 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
> > >  		tp->t_res_frextents_delta += delta;
> > >  		break;
> > >  	case XFS_TRANS_SB_DBLOCKS:
> > > -		ASSERT(delta > 0);
> > >  		tp->t_dblocks_delta += delta;
> > >  		break;
> > >  	case XFS_TRANS_SB_AGCOUNT:
> > > -- 
> > > 2.27.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 18:12       ` Darrick J. Wong
@ 2021-02-03 18:14         ` Darrick J. Wong
  2021-02-03 19:02         ` Gao Xiang
  2021-02-04  9:40         ` Gao Xiang
  2 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-02-03 18:14 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Brian Foster, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> > Hi Brian,
> > 
> > On Wed, Feb 03, 2021 at 09:23:37AM -0500, Brian Foster wrote:
> > > On Tue, Jan 26, 2021 at 08:56:20PM +0800, Gao Xiang wrote:
> > > > As the first step of shrinking, this attempts to enable shrinking
> > > > unused space in the last allocation group by fixing up freespace
> > > > btree, agi, agf and adjusting super block and use a helper
> > > > xfs_ag_shrink_space() to fixup the last AG.
> > > > 
> > > > This can be all done in one transaction for now, so I think no
> > > > additional protection is needed.
> > > > 
> > > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_fsops.c | 64 ++++++++++++++++++++++++++++++----------------
> > > >  fs/xfs/xfs_trans.c |  1 -
> > > >  2 files changed, 42 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > > index 6c4ab5e31054..4bcea22f7b3f 100644
> > > > --- a/fs/xfs/xfs_fsops.c
> > > > +++ b/fs/xfs/xfs_fsops.c
> > > > @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
> > > >  	struct aghdr_init_data	*id,
> > > >  	xfs_agnumber_t		oagcount,
> > > >  	xfs_agnumber_t		nagcount,
> > > > -	xfs_rfsblock_t		*delta)
> > > > +	int64_t			*delta)
> > > >  {
> > > >  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
> > > >  	int			error;
> > > > @@ -76,33 +76,41 @@ xfs_growfs_data_private(
> > > >  	xfs_agnumber_t		nagcount;
> > > >  	xfs_agnumber_t		nagimax = 0;
> > > >  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> > > > -	xfs_rfsblock_t		delta;
> > > > +	int64_t			delta;
> > > >  	xfs_agnumber_t		oagcount;
> > > >  	struct xfs_trans	*tp;
> > > > +	bool			extend;
> > > >  	struct aghdr_init_data	id = {};
> > > >  
> > > >  	nb = in->newblocks;
> > > > -	if (nb < mp->m_sb.sb_dblocks)
> > > > -		return -EINVAL;
> > > > -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> > > > +	if (nb == mp->m_sb.sb_dblocks)
> > > > +		return 0;
> > > > +
> > > > +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> > > > +	if (error)
> > > >  		return error;
> > > > -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > > +
> > > > +	if (nb > mp->m_sb.sb_dblocks) {
> > > > +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
> > > >  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
> > > >  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> > > > -	if (error)
> > > > -		return error;
> > > > -	xfs_buf_relse(bp);
> > > > +		if (error)
> > > > +			return error;
> > > > +		xfs_buf_relse(bp);
> > > > +	}
> > > >  
> > > >  	nb_div = nb;
> > > >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> > > >  	nagcount = nb_div + (nb_mod != 0);
> > > >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> > > >  		nagcount--;
> > > > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > > -		if (nb < mp->m_sb.sb_dblocks)
> > > > +		if (nagcount < 2)
> > > >  			return -EINVAL;
> > > 
> > > What's the reason for the nagcount < 2 check? IIRC we warn about this
> > > configuration at mkfs time, but allow it to proceed. Is it just that we
> > > don't want to accidentally put the fs into an agcount == 1 state that
> > > was originally formatted with >1 AGs?
> > 
> > Darrick once asked for avoiding shrinking the filesystem which has
> > only 1 AG.
> 
> It's worth mentioning why in a comment though:
> 
> 	/*
> 	 * XFS doesn't really support single-AG filesystems, so do not
> 	 * permit callers to remove the filesystem's second and last AG.
> 	 */
> 	if (shrink && new_agcount < 2)
> 		return -EHAHANOYOUDONT;
> 
> But as Brian points out, we /do/ allow adding a second AG to a single-AG
> fs.
> 
> > > 
> > > What about the case where we attempt to grow an agcount == 1 fs but
> > > don't enlarge enough to add the second AG? Does this change error
> > > behavior in that case?
> > 
> > Yeah, thanks for catching this! If growfs allows 1 AG case before,
> > I think it needs to be refined. Let me update this in the next version!
> > 
> > > 
> > > > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > >  	}
> > > > +
> > > >  	delta = nb - mp->m_sb.sb_dblocks;
> > > > +	extend = (delta > 0);
> > > >  	oagcount = mp->m_sb.sb_agcount;
> > > >  
> > > >  	/* allocate the new per-ag structures */
> > > > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> > > >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> > > >  		if (error)
> > > >  			return error;
> > > > +	} else if (nagcount < oagcount) {
> > > > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > > > +		return -EINVAL;
> > > >  	}
> > > >  
> > > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > > > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > > > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > > > +			XFS_TRANS_RESERVE, &tp);
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > > > -	if (error)
> > > > -		goto out_trans_cancel;
> > > > -
> > > > +	if (extend) {
> > > > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > > > +						  nagcount, &delta);
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > > +	}
> > > >  	xfs_trans_agblocks_delta(tp, id.nfree);
> > > 
> > > It looks like id isn't used until the resize call above. Is this call
> > > relevant for the shrink case?
> > 
> > I think it has nothing to do for the shrink the last AG case as well
> > (id.nfree == 0 here) but maybe use for the later shrinking the whole
> > AGs patchset. I can move into if (extend) in the next version.
> > 
> > > 
> > > >  
> > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > +	/* If there are some blocks in the last AG, resize it. */
> > > >  	if (delta) {
> > > 
> > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > of the function. Should we ever get to this point with delta == 0? (If
> > > not, maybe convert it to an assert just to be safe.)
> > 
> > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > growfs design is that, I don't want to touch the original logic). that
> > is why `delta' reflects the last AG delta now...
> 
> I've never liked how the meaning of "delta" changes through the
> function, and it clearly trips up reviewers.  This variable isn't the
> delta between the old dblocks and the new dblocks, it's really a
> resizefs cursor that tells us how much work we still have to do.
> 
> > > 
> > > > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		if (extend) {
> > > > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		} else {
> > > > +			id.agno = nagcount - 1;
> > > > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> > > 
> > > xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> > > pass in agno for now..?
> > 
> > Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
> > it seems more natural to pass in &id rather than agno (since
> > id.agno = nagcount - 1 will be stayed in some new helper
> > e.g. xfs_shrink_ags())
> 
> @id is struct aghdr_init_data, but shrinking shouldn't initialize any AG
> headers.  Are you planning to make use of it in shrink, either now or
> later on?
> 
> > 
> > > 
> > > > +		}
> > > > +
> > > >  		if (error)
> > > >  			goto out_trans_cancel;
> > > >  	}
> > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > > >  	 */
> > > >  	if (nagcount > oagcount)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > > -	if (nb > mp->m_sb.sb_dblocks)
> > > > +	if (nb != mp->m_sb.sb_dblocks)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > >  				 nb - mp->m_sb.sb_dblocks);
> > > 
> > > Maybe use delta here?
> > 
> > The reason is the same as above, `delta' here was changed due to 
> > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> > anymore. so `extend` boolean is used (rather than just use delta > 0)
> 
> Long question:
> 
> The reason why we use (nb - dblocks) is because growfs is an all or
> nothing operation -- either we succeed in writing new empty AGs and
> inflating the (former) last AG of the fs, or we don't do anything at
> all.  We don't allow partial growing; if we did, then delta would be
> relevant here.  I think we get away with not needing to run transactions
> for each AG because those new AGs are inaccessible until we commit the
> new agcount/dblocks, right?
> 
> In your design for the fs shrinker, do you anticipate being able to
> eliminate all the eligible AGs in a single transaction?  Or do you
> envision only tackling one AG at a time?  And can we be partially
> successful with a shrink?  e.g. we succeed at eliminating the last AG,
> but then the one before that isn't empty and so we bail out, but by that
> point we did actually make the fs a little bit smaller.
> 
> There's this comment at the bottom of xfs_growfs_data() that says that
> we can return error codes if the secondary sb update fails, even if the
> new size is already live.  This convinces me that it's always been the
> case that callers of the growfs ioctl are supposed to re-query the fs
> geometry afterwards to find out if the fs size changed, even if the
> ioctl itself returns an error... which implies that partial grow/shrink
> are a possibility.

And of course I got so buried in building up to my long question that I
forgot to ask it:

If the design of the shrinker requires incremental shrinking, should we
support incremental growfs too?

--D

> > 
> > > 
> > > >  	if (id.nfree)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > > >  
> > > 
> > > id.nfree tracks newly added free space in the growfs space. Is it not
> > > used in the shrink case because the allocation handles this for us?
> > 
> > Yeah, I'm afraid so. This is some common code, and also used in my
> > shrinking the whole AGs patchset.
> > 
> > > 
> > > >  	/*
> > > > -	 * update in-core counters now to reflect the real numbers
> > > > -	 * (especially sb_fdblocks)
> > > > +	 * update in-core counters now to reflect the real numbers (especially
> > > > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> > > >  	 */
> > > >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> > > >  		xfs_log_sb(tp);
> > > > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> > > >  	 * If we expanded the last AG, free the per-AG reservation
> > > >  	 * so we can reinitialize it with the new size.
> > > >  	 */
> > > > -	if (delta) {
> > > > +	if (extend && delta) {
> > > >  		struct xfs_perag	*pag;
> > > >  
> > > >  		pag = xfs_perag_get(mp, id.agno);
> > > 
> > > We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> > > this function. xfs_ag_shrink_space() from the previous patch is intended
> > > to deal with perag reservation changes for shrink, but it looks like the
> > > reserve call further down could potentially reset mp->m_finobt_nores to
> > > false if it previously might have been set to true.
> > 
> > Yeah, if my understanding is correct, I might need to call
> > xfs_fs_reserve_ag_blocks() only for growfs case as well for
> > mp->m_finobt_nores = true case.
> 
> I suppose it's worth trying in the finobt_nores==true case. :)
> 
> --D
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > Brian
> > > 
> > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > > index e72730f85af1..fd2cbf414b80 100644
> > > > --- a/fs/xfs/xfs_trans.c
> > > > +++ b/fs/xfs/xfs_trans.c
> > > > @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
> > > >  		tp->t_res_frextents_delta += delta;
> > > >  		break;
> > > >  	case XFS_TRANS_SB_DBLOCKS:
> > > > -		ASSERT(delta > 0);
> > > >  		tp->t_dblocks_delta += delta;
> > > >  		break;
> > > >  	case XFS_TRANS_SB_AGCOUNT:
> > > > -- 
> > > > 2.27.0
> > > > 
> > > 
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 18:12       ` Darrick J. Wong
  2021-02-03 18:14         ` Darrick J. Wong
@ 2021-02-03 19:02         ` Gao Xiang
  2021-02-03 19:19           ` Gao Xiang
  2021-02-04 12:33           ` Brian Foster
  2021-02-04  9:40         ` Gao Xiang
  2 siblings, 2 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-03 19:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Brian Foster, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

Hi Darrick,

On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:

...

> > > 
> > > > +		}
> > > > +
> > > >  		if (error)
> > > >  			goto out_trans_cancel;
> > > >  	}
> > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > > >  	 */
> > > >  	if (nagcount > oagcount)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > > -	if (nb > mp->m_sb.sb_dblocks)
> > > > +	if (nb != mp->m_sb.sb_dblocks)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > >  				 nb - mp->m_sb.sb_dblocks);
> > > 
> > > Maybe use delta here?
> > 
> > The reason is the same as above, `delta' here was changed due to 
> > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> > anymore. so `extend` boolean is used (rather than just use delta > 0)
> 
> Long question:
> 
> The reason why we use (nb - dblocks) is because growfs is an all or
> nothing operation -- either we succeed in writing new empty AGs and
> inflating the (former) last AG of the fs, or we don't do anything at
> all.  We don't allow partial growing; if we did, then delta would be
> relevant here.  I think we get away with not needing to run transactions
> for each AG because those new AGs are inaccessible until we commit the
> new agcount/dblocks, right?
> 
> In your design for the fs shrinker, do you anticipate being able to
> eliminate all the eligible AGs in a single transaction?  Or do you
> envision only tackling one AG at a time?  And can we be partially
> successful with a shrink?  e.g. we succeed at eliminating the last AG,
> but then the one before that isn't empty and so we bail out, but by that
> point we did actually make the fs a little bit smaller.

Thanks for your question. I'm about to sleep, I might try to answer
your question here.

As for my current experiement / understanding, I think eliminating all
the empty AGs + shrinking the tail AG in a single transaction is possible,
that is what I'm done for now;
 1) check the rest AGs are empty (from the nagcount AG to the oagcount - 1
    AG) and mark them all inactive (AGs freezed);
 2) consume an extent from the (nagcount - 1) AG;
 3) decrease the number of agcount from oagcount to nagcount.

Both 2) and 3) can be done in the same transaction, and after 1) the state
of such empty AGs is fixed as well. So on-disk fs and runtime states are
all in atomic.

> 
> There's this comment at the bottom of xfs_growfs_data() that says that
> we can return error codes if the secondary sb update fails, even if the
> new size is already live.  This convinces me that it's always been the
> case that callers of the growfs ioctl are supposed to re-query the fs
> geometry afterwards to find out if the fs size changed, even if the
> ioctl itself returns an error... which implies that partial grow/shrink
> are a possibility.
> 

I didn't realize that possibility but if my understanding is correct
the above process is described as above so no need to use incremental
shrinking by its design. But it also support incremental shrinking if
users try to use the ioctl for multiple times.

If I'm wrong, kindly point out, many thanks in advance!

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 19:02         ` Gao Xiang
@ 2021-02-03 19:19           ` Gao Xiang
  2021-02-04 12:33           ` Brian Foster
  1 sibling, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-03 19:19 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Brian Foster, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

On Thu, Feb 04, 2021 at 03:02:17AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> 
> ...
> 
> > > > 
> > > > > +		}
> > > > > +
> > > > >  		if (error)
> > > > >  			goto out_trans_cancel;
> > > > >  	}
> > > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > > > >  	 */
> > > > >  	if (nagcount > oagcount)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > > > -	if (nb > mp->m_sb.sb_dblocks)
> > > > > +	if (nb != mp->m_sb.sb_dblocks)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > > >  				 nb - mp->m_sb.sb_dblocks);
> > > > 
> > > > Maybe use delta here?
> > > 
> > > The reason is the same as above, `delta' here was changed due to 
> > > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> > > anymore. so `extend` boolean is used (rather than just use delta > 0)
> > 
> > Long question:
> > 
> > The reason why we use (nb - dblocks) is because growfs is an all or
> > nothing operation -- either we succeed in writing new empty AGs and
> > inflating the (former) last AG of the fs, or we don't do anything at
> > all.  We don't allow partial growing; if we did, then delta would be
> > relevant here.  I think we get away with not needing to run transactions
> > for each AG because those new AGs are inaccessible until we commit the
> > new agcount/dblocks, right?
> > 
> > In your design for the fs shrinker, do you anticipate being able to
> > eliminate all the eligible AGs in a single transaction?  Or do you
> > envision only tackling one AG at a time?  And can we be partially
> > successful with a shrink?  e.g. we succeed at eliminating the last AG,
> > but then the one before that isn't empty and so we bail out, but by that
> > point we did actually make the fs a little bit smaller.
> 
> Thanks for your question. I'm about to sleep, I might try to answer
> your question here.
> 
> As for my current experiement / understanding, I think eliminating all
> the empty AGs + shrinking the tail AG in a single transaction is possible,
> that is what I'm done for now;
>  1) check the rest AGs are empty (from the nagcount AG to the oagcount - 1
>     AG) and mark them all inactive (AGs freezed);

Add some words, there might raise up some additional assistance
transactions (e.g. if we'd like to confirm bmbt has the only one extent
rather than just do some basic math to confirm the whole AG is empty)
we might need to put all AGFL free blocks from AGFL to bmbt as well. Yet
that process is independent from the main shrinking transaction. And
in principle have no visible impact to users.

I'll reply the rest suggestions tomorrow, thanks for the review again!

Thanks,
Gao Xiang

>  2) consume an extent from the (nagcount - 1) AG;
>  3) decrease the number of agcount from oagcount to nagcount.
> 
> Both 2) and 3) can be done in the same transaction, and after 1) the state
> of such empty AGs is fixed as well. So on-disk fs and runtime states are
> all in atomic.
> 
> > 
> > There's this comment at the bottom of xfs_growfs_data() that says that
> > we can return error codes if the secondary sb update fails, even if the
> > new size is already live.  This convinces me that it's always been the
> > case that callers of the growfs ioctl are supposed to re-query the fs
> > geometry afterwards to find out if the fs size changed, even if the
> > ioctl itself returns an error... which implies that partial grow/shrink
> > are a possibility.
> > 
> 
> I didn't realize that possibility but if my understanding is correct
> the above process is described as above so no need to use incremental
> shrinking by its design. But it also support incremental shrinking if
> users try to use the ioctl for multiple times.
> 
> If I'm wrong, kindly point out, many thanks in advance!
> 
> Thanks,
> Gao Xiang
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 18:01       ` Brian Foster
@ 2021-02-04  9:18         ` Gao Xiang
  2021-02-04 12:33           ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Gao Xiang @ 2021-02-04  9:18 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Wed, Feb 03, 2021 at 01:01:26PM -0500, Brian Foster wrote:
> On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:

...

> > > 
> > > >  
> > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > +	/* If there are some blocks in the last AG, resize it. */
> > > >  	if (delta) {
> > > 
> > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > of the function. Should we ever get to this point with delta == 0? (If
> > > not, maybe convert it to an assert just to be safe.)
> > 
> > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > growfs design is that, I don't want to touch the original logic). that
> > is why `delta' reflects the last AG delta now...
> > 
> 
> Oh, I see. Hmm... that's a bit obfuscated and easy to miss. Perhaps the
> new helper should also include the extend_space() call below to do all
> of the AG updates in one place. It's not clear to me if we need to keep
> the growfs perag reservation code where it is. If so, the new helper
> could take a boolean pointer (instead of delta) that it can set to true
> if it had to extend the size of the old last AG because the perag res
> bits don't actually use the delta value. IOW, I think this hunk could
> look something like the following:
> 
> 	bool	resetagres = false;
> 
> 	if (extend)
> 		error = xfs_resizefs_init_new_ags(..., delta, &resetagres);
> 	else
> 		error = xfs_ag_shrink_space(... -delta);
> 	...
> 
> 	if (resetagres) {
> 		<do perag res fixups>
> 	}
> 	...
> 
> Hm?

Not quite sure got your point since xfs_resizefs_init_new_ags() is not
part of the transaction (and no need to). If you mean that the current
codebase needs some refactor to make the whole growfs operation as a
new helper, I could do in the next version, but one thing out there is
there are too many local variables, if we introduce some new helper,
a new struct argument might be needed.

And I have no idea why growfs perag reservation stays at the end of
the function. My own understanding is that if growfs perag reservation
here is somewhat racy since no AGI/AGF lock protection it seems.

Thanks,
Gao Xiang

> 
> Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs
  2021-02-03 18:01       ` Brian Foster
@ 2021-02-04  9:20         ` Gao Xiang
  0 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-04  9:20 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Darrick J . Wong

On Wed, Feb 03, 2021 at 01:01:40PM -0500, Brian Foster wrote:
> On Wed, Feb 03, 2021 at 11:01:32PM +0800, Gao Xiang wrote:

...

> > > > @@ -559,6 +560,10 @@ xfs_ag_shrink_space(
> > > >  	be32_add_cpu(&agf->agf_length, -len);
> > > >  
> > > >  	err2 = xfs_ag_resv_init(agibp->b_pag, *tpp);
> > > > +
> > > > +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_SHRINKFS_AG_RESV_FAIL))
> > > > +		err2 = -ENOSPC;
> > > > +
> > > 
> > > Seems reasonable, but I feel like this could be broadened to serve as a
> > > generic perag reservation error tag. I suppose we might not be able to
> > > use it on a clean mount, but perhaps it could be reused for growfs and
> > > remount. Hm?
> > 
> > I think it could be done in that way, yet currently the logic is just to
> > verify the shrink error handling case above rather than extend to actually
> > error inject per-AG reservation for now... I could rename the errortag
> > for later reuse (some better naming? I'm not good at this...) in advance
> > yet real per-AG reservation error injection might be more complicated
> > than just error out with -ENOSPC, and it's somewhat out of scope of this
> > patchset for now...
> > 
> 
> I don't think it needs to be any more complicated than the logic you
> have here. Just bury it further down in in the perag res init code,
> rename it to something like ERRTAG_AG_RESV_FAIL, and use it the exact
> same way for shrink testing. For example, maybe drop it into
> __xfs_ag_resv_init() near the xfs_mod_fdblocks() call so we can also
> take advantage of the tracepoint that triggers on -ENOSPC for
> informational purposes:
> 
> 	error = xfs_mod_fdblocks(...);
> 	if (!error && XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL))
> 		error = -ENOSPC;
> 	if (error) {
> 		...
> 	}

Ok, I didn't look into much more about it since it's out of scope. I'd
try in the next version.

Thanks,
Gao Xiang

> 
> Brian
> 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > Brian
> > 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 18:12       ` Darrick J. Wong
  2021-02-03 18:14         ` Darrick J. Wong
  2021-02-03 19:02         ` Gao Xiang
@ 2021-02-04  9:40         ` Gao Xiang
  2 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-04  9:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Brian Foster, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

Hi Darrick,

On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:

...

> > > >  	nb_div = nb;
> > > >  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
> > > >  	nagcount = nb_div + (nb_mod != 0);
> > > >  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
> > > >  		nagcount--;
> > > > -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > > -		if (nb < mp->m_sb.sb_dblocks)
> > > > +		if (nagcount < 2)
> > > >  			return -EINVAL;
> > > 
> > > What's the reason for the nagcount < 2 check? IIRC we warn about this
> > > configuration at mkfs time, but allow it to proceed. Is it just that we
> > > don't want to accidentally put the fs into an agcount == 1 state that
> > > was originally formatted with >1 AGs?
> > 
> > Darrick once asked for avoiding shrinking the filesystem which has
> > only 1 AG.
> 
> It's worth mentioning why in a comment though:
> 
> 	/*
> 	 * XFS doesn't really support single-AG filesystems, so do not
> 	 * permit callers to remove the filesystem's second and last AG.
> 	 */
> 	if (shrink && new_agcount < 2)
> 		return -EHAHANOYOUDONT;
> 
> But as Brian points out, we /do/ allow adding a second AG to a single-AG
> fs.

(cont.)

ok, thanks for this. anyway, I will cover such case in the next version.

> 
> > > 
> > > What about the case where we attempt to grow an agcount == 1 fs but
> > > don't enlarge enough to add the second AG? Does this change error
> > > behavior in that case?
> > 
> > Yeah, thanks for catching this! If growfs allows 1 AG case before,
> > I think it needs to be refined. Let me update this in the next version!
> > 
> > > 
> > > > +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> > > >  	}
> > > > +
> > > >  	delta = nb - mp->m_sb.sb_dblocks;
> > > > +	extend = (delta > 0);
> > > >  	oagcount = mp->m_sb.sb_agcount;
> > > >  
> > > >  	/* allocate the new per-ag structures */
> > > > @@ -110,22 +118,34 @@ xfs_growfs_data_private(
> > > >  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
> > > >  		if (error)
> > > >  			return error;
> > > > +	} else if (nagcount < oagcount) {
> > > > +		/* TODO: shrinking the entire AGs hasn't yet completed */
> > > > +		return -EINVAL;
> > > >  	}
> > > >  
> > > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> > > > -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> > > > +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> > > > +			XFS_TRANS_RESERVE, &tp);
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> > > > -	if (error)
> > > > -		goto out_trans_cancel;
> > > > -
> > > > +	if (extend) {
> > > > +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> > > > +						  nagcount, &delta);
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > > +	}
> > > >  	xfs_trans_agblocks_delta(tp, id.nfree);
> > > 
> > > It looks like id isn't used until the resize call above. Is this call
> > > relevant for the shrink case?
> > 
> > I think it has nothing to do for the shrink the last AG case as well
> > (id.nfree == 0 here) but maybe use for the later shrinking the whole
> > AGs patchset. I can move into if (extend) in the next version.
> > 
> > > 
> > > >  
> > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > +	/* If there are some blocks in the last AG, resize it. */
> > > >  	if (delta) {
> > > 
> > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > of the function. Should we ever get to this point with delta == 0? (If
> > > not, maybe convert it to an assert just to be safe.)
> > 
> > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > growfs design is that, I don't want to touch the original logic). that
> > is why `delta' reflects the last AG delta now...
> 
> I've never liked how the meaning of "delta" changes through the
> function, and it clearly trips up reviewers.  This variable isn't the
> delta between the old dblocks and the new dblocks, it's really a
> resizefs cursor that tells us how much work we still have to do.

I found the first patch of this patchset has been merged into for-next,
so some new idea about this? (split delta into 2 variables or some else
way you'd prefer? so I could update in the next version as a whole...)

> 
> > > 
> > > > -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		if (extend) {
> > > > +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> > > > +		} else {
> > > > +			id.agno = nagcount - 1;
> > > > +			error = xfs_ag_shrink_space(mp, &tp, &id, -delta);
> > > 
> > > xfs_ag_shrink_space() looks like it only accesses id->agno. Perhaps just
> > > pass in agno for now..?
> > 
> > Both way are ok, yet in my incomplete shrink whole empty AGs patchset,
> > it seems more natural to pass in &id rather than agno (since
> > id.agno = nagcount - 1 will be stayed in some new helper
> > e.g. xfs_shrink_ags())
> 
> @id is struct aghdr_init_data, but shrinking shouldn't initialize any AG
> headers.  Are you planning to make use of it in shrink, either now or
> later on?

I tried to use it as a global context structure for shrinking the whole AGs
and the tail AG since I'm not sure we need to introduce another new structure
to make it more complex, but yeah the naming is somewhat confusing now.

> 

...

> > > 
> > > >  	/*
> > > > -	 * update in-core counters now to reflect the real numbers
> > > > -	 * (especially sb_fdblocks)
> > > > +	 * update in-core counters now to reflect the real numbers (especially
> > > > +	 * sb_fdblocks). And xfs_validate_sb_write() can pass for shrinkfs.
> > > >  	 */
> > > >  	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> > > >  		xfs_log_sb(tp);
> > > > @@ -165,7 +185,7 @@ xfs_growfs_data_private(
> > > >  	 * If we expanded the last AG, free the per-AG reservation
> > > >  	 * so we can reinitialize it with the new size.
> > > >  	 */
> > > > -	if (delta) {
> > > > +	if (extend && delta) {
> > > >  		struct xfs_perag	*pag;
> > > >  
> > > >  		pag = xfs_perag_get(mp, id.agno);
> > > 
> > > We call xfs_fs_reserve_ag_blocks() a bit further down before we exit
> > > this function. xfs_ag_shrink_space() from the previous patch is intended
> > > to deal with perag reservation changes for shrink, but it looks like the
> > > reserve call further down could potentially reset mp->m_finobt_nores to
> > > false if it previously might have been set to true.
> > 
> > Yeah, if my understanding is correct, I might need to call
> > xfs_fs_reserve_ag_blocks() only for growfs case as well for
> > mp->m_finobt_nores = true case.
> 
> I suppose it's worth trying in the finobt_nores==true case. :)
> 

I didn't notice such trick before, will find some clue about this as well.

Also as Brian mentioned, I'm not sure why xfs_ag_resv_free() the last AG
and xfs_fs_reserve_ag_blocks() after AGF/AGI are unlocked for growfs...
I think there could be some race window if some other fs allocation
operations in parellel?

Thanks,
Gao Xiang

> --D


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-03 19:02         ` Gao Xiang
  2021-02-03 19:19           ` Gao Xiang
@ 2021-02-04 12:33           ` Brian Foster
  2021-02-04 13:58             ` Gao Xiang
  1 sibling, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-04 12:33 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Darrick J. Wong, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

On Thu, Feb 04, 2021 at 03:02:17AM +0800, Gao Xiang wrote:
> Hi Darrick,
> 
> On Wed, Feb 03, 2021 at 10:12:11AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> 
> ...
> 
> > > > 
> > > > > +		}
> > > > > +
> > > > >  		if (error)
> > > > >  			goto out_trans_cancel;
> > > > >  	}
> > > > > @@ -137,15 +157,15 @@ xfs_growfs_data_private(
> > > > >  	 */
> > > > >  	if (nagcount > oagcount)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> > > > > -	if (nb > mp->m_sb.sb_dblocks)
> > > > > +	if (nb != mp->m_sb.sb_dblocks)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > > >  				 nb - mp->m_sb.sb_dblocks);
> > > > 
> > > > Maybe use delta here?
> > > 
> > > The reason is the same as above, `delta' here was changed due to 
> > > xfs_resizefs_init_new_ags(), which is not nb - mp->m_sb.sb_dblocks
> > > anymore. so `extend` boolean is used (rather than just use delta > 0)
> > 
> > Long question:
> > 
> > The reason why we use (nb - dblocks) is because growfs is an all or
> > nothing operation -- either we succeed in writing new empty AGs and
> > inflating the (former) last AG of the fs, or we don't do anything at
> > all.  We don't allow partial growing; if we did, then delta would be
> > relevant here.  I think we get away with not needing to run transactions
> > for each AG because those new AGs are inaccessible until we commit the
> > new agcount/dblocks, right?
> > 
> > In your design for the fs shrinker, do you anticipate being able to
> > eliminate all the eligible AGs in a single transaction?  Or do you
> > envision only tackling one AG at a time?  And can we be partially
> > successful with a shrink?  e.g. we succeed at eliminating the last AG,
> > but then the one before that isn't empty and so we bail out, but by that
> > point we did actually make the fs a little bit smaller.
> 
> Thanks for your question. I'm about to sleep, I might try to answer
> your question here.
> 
> As for my current experiement / understanding, I think eliminating all
> the empty AGs + shrinking the tail AG in a single transaction is possible,
> that is what I'm done for now;
>  1) check the rest AGs are empty (from the nagcount AG to the oagcount - 1
>     AG) and mark them all inactive (AGs freezed);
>  2) consume an extent from the (nagcount - 1) AG;
>  3) decrease the number of agcount from oagcount to nagcount.
> 
> Both 2) and 3) can be done in the same transaction, and after 1) the state
> of such empty AGs is fixed as well. So on-disk fs and runtime states are
> all in atomic.
> 
> > 
> > There's this comment at the bottom of xfs_growfs_data() that says that
> > we can return error codes if the secondary sb update fails, even if the
> > new size is already live.  This convinces me that it's always been the
> > case that callers of the growfs ioctl are supposed to re-query the fs
> > geometry afterwards to find out if the fs size changed, even if the
> > ioctl itself returns an error... which implies that partial grow/shrink
> > are a possibility.
> > 
> 
> I didn't realize that possibility but if my understanding is correct
> the above process is described as above so no need to use incremental
> shrinking by its design. But it also support incremental shrinking if
> users try to use the ioctl for multiple times.
> 

This was one of the things I wondered about on an earlier versions of
this work; whether we wanted to shrink to be deliberately incremental or
not. I suspect that somewhat applies to even this version without AG
truncation because technically we could allocate as much as possible out
of end of the last AG and shrink by that amount. My initial thought was
that if the implementation is going to be opportunistic (i.e., we
provide no help to actually free up targeted space), perhaps an
incremental implementation is a useful means to allow the operation to
make progress. E.g., run a shrink, observe it didn't fully complete,
shuffle around some files, repeat, etc. 

IIRC, one of the downsides of that sort of approach is any use case
where the goal is an underlying storage device resize. I suppose an
underlying device resize could also be opportunistic, but it seems more
likely to me that use case would prefer an all or nothing approach,
particularly if associated userspace tools don't really know how to
handle a partially successful fs shrink. Do we have any idea how other
tools/fs' behave in this regard (I thought ext4 supported shrink)? FWIW,
it also seems potentially annoying to ask for a largish shrink only for
the tool to hand back something relatively tiny.

Based on your design description, it occurs to me that perhaps the ideal
outcome is an implementation that supports a fully atomic all-or-nothing
shrink (assuming this is reasonably possible), but supports an optional
incremental mode specified by the interface. IOW, if we have the ability
to perform all-or-nothing, then it _seems_ like a minor interface
enhancement to support incremental on top of that as opposed to the
other way around. Therefore, perhaps that should be the initial goal
until shown to be too complex or otherwise problematic..?

Brian

> If I'm wrong, kindly point out, many thanks in advance!
> 
> Thanks,
> Gao Xiang
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-04  9:18         ` Gao Xiang
@ 2021-02-04 12:33           ` Brian Foster
  2021-02-04 16:21             ` Gao Xiang
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2021-02-04 12:33 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

On Thu, Feb 04, 2021 at 05:18:35PM +0800, Gao Xiang wrote:
> On Wed, Feb 03, 2021 at 01:01:26PM -0500, Brian Foster wrote:
> > On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> 
> ...
> 
> > > > 
> > > > >  
> > > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > > +	/* If there are some blocks in the last AG, resize it. */
> > > > >  	if (delta) {
> > > > 
> > > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > > of the function. Should we ever get to this point with delta == 0? (If
> > > > not, maybe convert it to an assert just to be safe.)
> > > 
> > > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > > growfs design is that, I don't want to touch the original logic). that
> > > is why `delta' reflects the last AG delta now...
> > > 
> > 
> > Oh, I see. Hmm... that's a bit obfuscated and easy to miss. Perhaps the
> > new helper should also include the extend_space() call below to do all
> > of the AG updates in one place. It's not clear to me if we need to keep
> > the growfs perag reservation code where it is. If so, the new helper
> > could take a boolean pointer (instead of delta) that it can set to true
> > if it had to extend the size of the old last AG because the perag res
> > bits don't actually use the delta value. IOW, I think this hunk could
> > look something like the following:
> > 
> > 	bool	resetagres = false;
> > 
> > 	if (extend)
> > 		error = xfs_resizefs_init_new_ags(..., delta, &resetagres);
> > 	else
> > 		error = xfs_ag_shrink_space(... -delta);
> > 	...
> > 
> > 	if (resetagres) {
> > 		<do perag res fixups>
> > 	}
> > 	...
> > 
> > Hm?
> 
> Not quite sure got your point since xfs_resizefs_init_new_ags() is not
> part of the transaction (and no need to). If you mean that the current
> codebase needs some refactor to make the whole growfs operation as a
> new helper, I could do in the next version, but one thing out there is
> there are too many local variables, if we introduce some new helper,
> a new struct argument might be needed.
> 

That seems fine either way. I think it's just a matter of passing the
transaction to the function or not. I've appended a diff based on the
previous refactoring patch to demonstrate what I mean (compile tested
only).

> And I have no idea why growfs perag reservation stays at the end of
> the function. My own understanding is that if growfs perag reservation
> here is somewhat racy since no AGI/AGF lock protection it seems.
> 

Ok. It's probably best to leave it alone until we figure that out and
then address it in a separate patch, if desired.

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6c4ab5e31054..707c9379d6c1 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -34,19 +34,20 @@
  */
 static int
 xfs_resizefs_init_new_ags(
-	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
 	struct aghdr_init_data	*id,
 	xfs_agnumber_t		oagcount,
 	xfs_agnumber_t		nagcount,
-	xfs_rfsblock_t		*delta)
+	xfs_rfsblock_t		delta)
 {
-	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
+	struct xfs_mount	*mp = tp->t_mountp;
+	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + delta;
 	int			error;
 
 	INIT_LIST_HEAD(&id->buffer_list);
 	for (id->agno = nagcount - 1;
 	     id->agno >= oagcount;
-	     id->agno--, *delta -= id->agsize) {
+	     id->agno--, delta -= id->agsize) {
 
 		if (id->agno == nagcount - 1)
 			id->agsize = nb - (id->agno *
@@ -60,7 +61,16 @@ xfs_resizefs_init_new_ags(
 			return error;
 		}
 	}
-	return xfs_buf_delwri_submit(&id->buffer_list);
+
+	error = xfs_buf_delwri_submit(&id->buffer_list);
+	if (error)
+		return error;
+
+	xfs_trans_agblocks_delta(tp, id->nfree);
+
+	if (delta)
+		error = xfs_ag_extend_space(mp, tp, id, delta);
+	return error;
 }
 
 /*
@@ -117,19 +127,10 @@ xfs_growfs_data_private(
 	if (error)
 		return error;
 
-	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
+	error = xfs_resizefs_init_new_ags(tp, &id, oagcount, nagcount, delta);
 	if (error)
 		goto out_trans_cancel;
 
-	xfs_trans_agblocks_delta(tp, id.nfree);
-
-	/* If there are new blocks in the old last AG, extend it. */
-	if (delta) {
-		error = xfs_ag_extend_space(mp, tp, &id, delta);
-		if (error)
-			goto out_trans_cancel;
-	}
-
 	/*
 	 * Update changed superblock fields transactionally. These are not
 	 * seen by the rest of the world until the transaction commit applies


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-04 12:33           ` Brian Foster
@ 2021-02-04 13:58             ` Gao Xiang
  0 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-04 13:58 UTC (permalink / raw)
  To: Brian Foster
  Cc: Darrick J. Wong, linux-xfs, Darrick J. Wong, Eric Sandeen,
	Dave Chinner, Christoph Hellwig

Hi Brian,

On Thu, Feb 04, 2021 at 07:33:03AM -0500, Brian Foster wrote:
> On Thu, Feb 04, 2021 at 03:02:17AM +0800, Gao Xiang wrote:

....

> > > 
> > > Long question:
> > > 
> > > The reason why we use (nb - dblocks) is because growfs is an all or
> > > nothing operation -- either we succeed in writing new empty AGs and
> > > inflating the (former) last AG of the fs, or we don't do anything at
> > > all.  We don't allow partial growing; if we did, then delta would be
> > > relevant here.  I think we get away with not needing to run transactions
> > > for each AG because those new AGs are inaccessible until we commit the
> > > new agcount/dblocks, right?
> > > 
> > > In your design for the fs shrinker, do you anticipate being able to
> > > eliminate all the eligible AGs in a single transaction?  Or do you
> > > envision only tackling one AG at a time?  And can we be partially
> > > successful with a shrink?  e.g. we succeed at eliminating the last AG,
> > > but then the one before that isn't empty and so we bail out, but by that
> > > point we did actually make the fs a little bit smaller.
> > 
> > Thanks for your question. I'm about to sleep, I might try to answer
> > your question here.
> > 
> > As for my current experiement / understanding, I think eliminating all
> > the empty AGs + shrinking the tail AG in a single transaction is possible,
> > that is what I'm done for now;
> >  1) check the rest AGs are empty (from the nagcount AG to the oagcount - 1
> >     AG) and mark them all inactive (AGs freezed);
> >  2) consume an extent from the (nagcount - 1) AG;
> >  3) decrease the number of agcount from oagcount to nagcount.
> > 
> > Both 2) and 3) can be done in the same transaction, and after 1) the state
> > of such empty AGs is fixed as well. So on-disk fs and runtime states are
> > all in atomic.
> > 
> > > 
> > > There's this comment at the bottom of xfs_growfs_data() that says that
> > > we can return error codes if the secondary sb update fails, even if the
> > > new size is already live.  This convinces me that it's always been the
> > > case that callers of the growfs ioctl are supposed to re-query the fs
> > > geometry afterwards to find out if the fs size changed, even if the
> > > ioctl itself returns an error... which implies that partial grow/shrink
> > > are a possibility.
> > > 
> > 
> > I didn't realize that possibility but if my understanding is correct
> > the above process is described as above so no need to use incremental
> > shrinking by its design. But it also support incremental shrinking if
> > users try to use the ioctl for multiple times.
> > 
> 
> This was one of the things I wondered about on an earlier versions of
> this work; whether we wanted to shrink to be deliberately incremental or
> not. I suspect that somewhat applies to even this version without AG
> truncation because technically we could allocate as much as possible out
> of end of the last AG and shrink by that amount. My initial thought was
> that if the implementation is going to be opportunistic (i.e., we
> provide no help to actually free up targeted space), perhaps an
> incremental implementation is a useful means to allow the operation to
> make progress. E.g., run a shrink, observe it didn't fully complete,
> shuffle around some files, repeat, etc. 
> 
> IIRC, one of the downsides of that sort of approach is any use case
> where the goal is an underlying storage device resize. I suppose an
> underlying device resize could also be opportunistic, but it seems more
> likely to me that use case would prefer an all or nothing approach,
> particularly if associated userspace tools don't really know how to
> handle a partially successful fs shrink. Do we have any idea how other
> tools/fs' behave in this regard (I thought ext4 supported shrink)? FWIW,
> it also seems potentially annoying to ask for a largish shrink only for
> the tool to hand back something relatively tiny.
> 
> Based on your design description, it occurs to me that perhaps the ideal
> outcome is an implementation that supports a fully atomic all-or-nothing
> shrink (assuming this is reasonably possible), but supports an optional
> incremental mode specified by the interface. IOW, if we have the ability
> to perform all-or-nothing, then it _seems_ like a minor interface
> enhancement to support incremental on top of that as opposed to the
> other way around. Therefore, perhaps that should be the initial goal
> until shown to be too complex or otherwise problematic..?
> 

I cannot say too much of this, yet my current observation is that
shrinking tail empty AG [+ empty AGs (optional)] in one transaction
is practical (I don't see any barrier so far [1]). I'm implementing
an atomic all-or-nothing truncation and userspace can utilize it to
implement in all-or-nothing way (I saw Dave's spaceman work before) or
incremental way (by using binary search approach and multiple ioctls)...
In principle, supporting the ioctl with the extra partial shrinking
feature is practial as well (but additional work might need to be
done). And also, I'm not sure it's user-friendly since most end-users
might want an all-or-nothing shrinking (at least in the fs truncation
step) result.

btw, afaik (my limited understanding), Ext4 shrinking is an offline
approach so it's somewhat easier to implement (no need to consider
any runtime impact), which is also considered as an all-or-nothing
truncation as well (Although it also supports -M to shrink the
filesystem to the minimum size, I think it can be implemented by
multiple all-or-nothing shrink ioctls...)

Thanks,
Gao Xiang

[1] it's somewhat outdated yet I'd like to finish this tail AG patchset
first
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git/log/?h=xfs/shrink2

> Brian
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 6/7] xfs: support shrinking unused space in the last AG
  2021-02-04 12:33           ` Brian Foster
@ 2021-02-04 16:21             ` Gao Xiang
  0 siblings, 0 replies; 30+ messages in thread
From: Gao Xiang @ 2021-02-04 16:21 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-xfs, Darrick J. Wong, Eric Sandeen, Dave Chinner,
	Christoph Hellwig

Hi Brian,

On Thu, Feb 04, 2021 at 07:33:16AM -0500, Brian Foster wrote:
> On Thu, Feb 04, 2021 at 05:18:35PM +0800, Gao Xiang wrote:
> > On Wed, Feb 03, 2021 at 01:01:26PM -0500, Brian Foster wrote:
> > > On Wed, Feb 03, 2021 at 10:51:46PM +0800, Gao Xiang wrote:
> > 
> > ...
> > 
> > > > > 
> > > > > >  
> > > > > > -	/* If there are new blocks in the old last AG, extend it. */
> > > > > > +	/* If there are some blocks in the last AG, resize it. */
> > > > > >  	if (delta) {
> > > > > 
> > > > > This patch added a (nb == mp->m_sb.sb_dblocks) shortcut check at the top
> > > > > of the function. Should we ever get to this point with delta == 0? (If
> > > > > not, maybe convert it to an assert just to be safe.)
> > > > 
> > > > delta would be changed after xfs_resizefs_init_new_ags() (the original
> > > > growfs design is that, I don't want to touch the original logic). that
> > > > is why `delta' reflects the last AG delta now...
> > > > 
> > > 
> > > Oh, I see. Hmm... that's a bit obfuscated and easy to miss. Perhaps the
> > > new helper should also include the extend_space() call below to do all
> > > of the AG updates in one place. It's not clear to me if we need to keep
> > > the growfs perag reservation code where it is. If so, the new helper
> > > could take a boolean pointer (instead of delta) that it can set to true
> > > if it had to extend the size of the old last AG because the perag res
> > > bits don't actually use the delta value. IOW, I think this hunk could
> > > look something like the following:
> > > 
> > > 	bool	resetagres = false;
> > > 
> > > 	if (extend)
> > > 		error = xfs_resizefs_init_new_ags(..., delta, &resetagres);
> > > 	else
> > > 		error = xfs_ag_shrink_space(... -delta);
> > > 	...
> > > 
> > > 	if (resetagres) {
> > > 		<do perag res fixups>
> > > 	}
> > > 	...
> > > 
> > > Hm?
> > 
> > Not quite sure got your point since xfs_resizefs_init_new_ags() is not
> > part of the transaction (and no need to). If you mean that the current
> > codebase needs some refactor to make the whole growfs operation as a
> > new helper, I could do in the next version, but one thing out there is
> > there are too many local variables, if we introduce some new helper,
> > a new struct argument might be needed.
> > 
> 
> That seems fine either way. I think it's just a matter of passing the
> transaction to the function or not. I've appended a diff based on the
> previous refactoring patch to demonstrate what I mean (compile tested
> only).

(forget to reply this email...)

Ok, will update in the next version.

> 
> > And I have no idea why growfs perag reservation stays at the end of
> > the function. My own understanding is that if growfs perag reservation
> > here is somewhat racy since no AGI/AGF lock protection it seems.
> > 
> 
> Ok. It's probably best to leave it alone until we figure that out and
> then address it in a separate patch, if desired.

Okay.

Thanks,
Gao Xiang

> 
> Brian


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-02-04 16:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26 12:56 [PATCH v6 0/7] xfs: support shrinking free space in the last AG Gao Xiang
2021-01-26 12:56 ` [PATCH v6 1/7] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
2021-02-02 19:37   ` Brian Foster
2021-01-26 12:56 ` [PATCH v6 2/7] xfs: get rid of xfs_growfs_{data,log}_t Gao Xiang
2021-02-02 19:37   ` Brian Foster
2021-01-26 12:56 ` [PATCH v6 3/7] xfs: update lazy sb counters immediately for resizefs Gao Xiang
2021-02-02 19:38   ` Brian Foster
2021-02-03  0:45     ` Gao Xiang
2021-01-26 12:56 ` [PATCH v6 4/7] xfs: hoist out xfs_resizefs_init_new_ags() Gao Xiang
2021-02-02 19:38   ` Brian Foster
2021-01-26 12:56 ` [PATCH v6 5/7] xfs: introduce xfs_ag_shrink_space() Gao Xiang
2021-01-26 12:56 ` [PATCH v6 6/7] xfs: support shrinking unused space in the last AG Gao Xiang
2021-02-03 14:23   ` Brian Foster
2021-02-03 14:51     ` Gao Xiang
2021-02-03 18:01       ` Brian Foster
2021-02-04  9:18         ` Gao Xiang
2021-02-04 12:33           ` Brian Foster
2021-02-04 16:21             ` Gao Xiang
2021-02-03 18:12       ` Darrick J. Wong
2021-02-03 18:14         ` Darrick J. Wong
2021-02-03 19:02         ` Gao Xiang
2021-02-03 19:19           ` Gao Xiang
2021-02-04 12:33           ` Brian Foster
2021-02-04 13:58             ` Gao Xiang
2021-02-04  9:40         ` Gao Xiang
2021-01-26 12:56 ` [PATCH v6 7/7] xfs: add error injection for per-AG resv failure when shrinkfs Gao Xiang
2021-02-03 14:23   ` Brian Foster
2021-02-03 15:01     ` Gao Xiang
2021-02-03 18:01       ` Brian Foster
2021-02-04  9:20         ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).