Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out
@ 2021-01-23 18:51 Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
                   ` (10 more replies)
  0 siblings, 11 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:51 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

Hi all,

Historically, when users ran out of space or quota when trying to write
to the filesystem, XFS didn't try very hard to reclaim space that it
might have speculatively allocated for the purpose of speeding up
front-end filesystem operations (appending writes, cow staging).  The
upcoming deferred inactivation series will greatly increase the amount
of allocated space that isn't actively being used to store user data.

Therefore, try to reduce the circumstances where we return EDQUOT or
ENOSPC to userspace by teaching the write paths to try to clear space
and retry the operation one time before giving up.

v2: clean up and rebase against 5.11.
v3: restructure the retry loops per dchinner suggestion
v4: simplify the calling convention of xfs_trans_reserve_quota_nblks

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=reclaim-space-harder-5.12
---
 fs/xfs/libxfs/xfs_attr.c |    8 +-
 fs/xfs/libxfs/xfs_bmap.c |    8 +-
 fs/xfs/xfs_bmap_util.c   |   17 +++-
 fs/xfs/xfs_file.c        |   21 ++---
 fs/xfs/xfs_icache.c      |  185 ++++++++++++++++++++++++++++++----------------
 fs/xfs/xfs_icache.h      |    7 +-
 fs/xfs/xfs_inode.c       |   17 +++-
 fs/xfs/xfs_ioctl.c       |   14 +++
 fs/xfs/xfs_iomap.c       |   19 ++++-
 fs/xfs/xfs_iops.c        |   13 ++-
 fs/xfs/xfs_qm.c          |   34 ++++++--
 fs/xfs/xfs_quota.h       |   41 ++++++----
 fs/xfs/xfs_reflink.c     |   16 +++-
 fs/xfs/xfs_symlink.c     |    8 +-
 fs/xfs/xfs_trace.c       |    1 
 fs/xfs/xfs_trace.h       |   40 ++++++++++
 fs/xfs/xfs_trans.c       |   20 +++++
 fs/xfs/xfs_trans_dquot.c |  109 +++++++++++++++++++++++----
 18 files changed, 430 insertions(+), 148 deletions(-)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-25 18:13   ` Brian Foster
  2021-01-23 18:52 ` [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

The functions to run an eof/cowblocks scan to try to reduce quota usage
are kind of a mess -- the logic repeatedly initializes an eofb structure
and there are logic bugs in the code that result in the cowblocks scan
never actually happening.

Replace all three functions with a single function that fills out an
eofb if we're low on quota and runs both eof and cowblocks scans.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c   |   15 ++++++---------
 fs/xfs/xfs_icache.c |   46 ++++++++++++++++------------------------------
 fs/xfs/xfs_icache.h |    4 ++--
 3 files changed, 24 insertions(+), 41 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5d4a66c72c78..69879237533b 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -713,7 +713,7 @@ xfs_file_buffered_write(
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
-	int			enospc = 0;
+	bool			cleared_space = false;
 	int			iolock;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
@@ -745,19 +745,16 @@ xfs_file_buffered_write(
 	 * also behaves as a filter to prevent too many eofblocks scans from
 	 * running at the same time.
 	 */
-	if (ret == -EDQUOT && !enospc) {
+	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		enospc = xfs_inode_free_quota_eofblocks(ip);
-		if (enospc)
-			goto write_retry;
-		enospc = xfs_inode_free_quota_cowblocks(ip);
-		if (enospc)
+		cleared_space = xfs_inode_free_quota_blocks(ip);
+		if (cleared_space)
 			goto write_retry;
 		iolock = 0;
-	} else if (ret == -ENOSPC && !enospc) {
+	} else if (ret == -ENOSPC && !cleared_space) {
 		struct xfs_eofblocks eofb = {0};
 
-		enospc = 1;
+		cleared_space = true;
 		xfs_flush_inodes(ip->i_mount);
 
 		xfs_iunlock(ip, iolock);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index deb99300d171..c71eb15e3835 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1397,33 +1397,31 @@ xfs_icache_free_eofblocks(
 }
 
 /*
- * Run eofblocks scans on the quotas applicable to the inode. For inodes with
- * multiple quotas, we don't know exactly which quota caused an allocation
+ * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
+ * with multiple quotas, we don't know exactly which quota caused an allocation
  * failure. We make a best effort by including each quota under low free space
  * conditions (less than 1% free space) in the scan.
  */
-static int
-__xfs_inode_free_quota_eofblocks(
-	struct xfs_inode	*ip,
-	int			(*execute)(struct xfs_mount *mp,
-					   struct xfs_eofblocks	*eofb))
+bool
+xfs_inode_free_quota_blocks(
+	struct xfs_inode	*ip)
 {
-	int scan = 0;
-	struct xfs_eofblocks eofb = {0};
-	struct xfs_dquot *dq;
+	struct xfs_eofblocks	eofb = {0};
+	struct xfs_dquot	*dq;
+	bool			do_work = false;
 
 	/*
 	 * Run a sync scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC;
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
 		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
 		if (dq && xfs_dquot_lowsp(dq)) {
 			eofb.eof_uid = VFS_I(ip)->i_uid;
 			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
-			scan = 1;
+			do_work = true;
 		}
 	}
 
@@ -1432,21 +1430,16 @@ __xfs_inode_free_quota_eofblocks(
 		if (dq && xfs_dquot_lowsp(dq)) {
 			eofb.eof_gid = VFS_I(ip)->i_gid;
 			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
-			scan = 1;
+			do_work = true;
 		}
 	}
 
-	if (scan)
-		execute(ip->i_mount, &eofb);
+	if (!do_work)
+		return false;
 
-	return scan;
-}
-
-int
-xfs_inode_free_quota_eofblocks(
-	struct xfs_inode *ip)
-{
-	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
+	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+	return true;
 }
 
 static inline unsigned long
@@ -1646,13 +1639,6 @@ xfs_icache_free_cowblocks(
 			XFS_ICI_COWBLOCKS_TAG);
 }
 
-int
-xfs_inode_free_quota_cowblocks(
-	struct xfs_inode *ip)
-{
-	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_cowblocks);
-}
-
 void
 xfs_inode_set_cowblocks_tag(
 	xfs_inode_t	*ip)
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 3a4c8b382cd0..3f7ddbca8638 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,17 +54,17 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
+bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
+
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
-int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
 void xfs_eofblocks_worker(struct work_struct *);
 void xfs_queue_eofblocks(struct xfs_mount *);
 
 void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
-int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
 void xfs_cowblocks_worker(struct work_struct *);
 void xfs_queue_cowblocks(struct xfs_mount *);
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-25 18:14   ` Brian Foster
  2021-01-23 18:52 ` [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Don't stall the cowblocks scan on a locked inode if we possibly can.
We'd much rather the background scanner keep moving.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index c71eb15e3835..89f9e692fde7 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
 	void			*args)
 {
 	struct xfs_eofblocks	*eofb = args;
+	bool			wait;
 	int			ret = 0;
 
+	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
+
 	if (!xfs_prep_free_cowblocks(ip))
 		return 0;
 
 	if (!xfs_inode_matches_eofb(ip, eofb))
 		return 0;
 
-	/* Free the CoW blocks */
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+	/*
+	 * If the caller is waiting, return -EAGAIN to keep the background
+	 * scanner moving and revisit the inode in a subsequent pass.
+	 */
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+		if (wait)
+			return -EAGAIN;
+		return 0;
+	}
+	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
+		if (wait)
+			ret = -EAGAIN;
+		goto out_iolock;
+	}
 
 	/*
 	 * Check again, nobody else should be able to dirty blocks or change
@@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
 		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
 
 	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+out_iolock:
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 	return ret;


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-25 18:14   ` Brian Foster
  2021-01-23 18:52 ` [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts Darrick J. Wong
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Buffered writers who have run out of quota reservation call
xfs_inode_free_quota_blocks to try to free any space reservations that
might reduce the quota usage.  Unfortunately, the buffered write path
treats "out of project quota" the same as "out of overall space" so this
function has never supported scanning for space that might ease an "out
of project quota" condition.

We're about to start using this function for cases where we actually
/can/ tell if we're out of project quota, so add in this functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_icache.c |    9 +++++++++
 1 file changed, 9 insertions(+)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 89f9e692fde7..10c1a0dee17d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1434,6 +1434,15 @@ xfs_inode_free_quota_blocks(
 		}
 	}
 
+	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
+		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
+		if (dq && xfs_dquot_lowsp(dq)) {
+			eofb.eof_prid = ip->i_d.di_projid;
+			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+			do_work = true;
+		}
+	}
+
 	if (!do_work)
 		return false;
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-25 18:14   ` Brian Foster
  2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Move this function further down in the file so that later cleanups won't
have to declare static functions.  Change the name because we're about
to rework all the code that performs garbage collection of speculatively
allocated file blocks.  No functional changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c   |    2 -
 fs/xfs/xfs_icache.c |  110 ++++++++++++++++++++++++++-------------------------
 fs/xfs/xfs_icache.h |    2 -
 3 files changed, 57 insertions(+), 57 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 69879237533b..d69e5abcc1b4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -747,7 +747,7 @@ xfs_file_buffered_write(
 	 */
 	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		cleared_space = xfs_inode_free_quota_blocks(ip);
+		cleared_space = xfs_blockgc_free_quota(ip);
 		if (cleared_space)
 			goto write_retry;
 		iolock = 0;
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 10c1a0dee17d..aba901d5637b 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1396,61 +1396,6 @@ xfs_icache_free_eofblocks(
 			XFS_ICI_EOFBLOCKS_TAG);
 }
 
-/*
- * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
- * with multiple quotas, we don't know exactly which quota caused an allocation
- * failure. We make a best effort by including each quota under low free space
- * conditions (less than 1% free space) in the scan.
- */
-bool
-xfs_inode_free_quota_blocks(
-	struct xfs_inode	*ip)
-{
-	struct xfs_eofblocks	eofb = {0};
-	struct xfs_dquot	*dq;
-	bool			do_work = false;
-
-	/*
-	 * Run a sync scan to increase effectiveness and use the union filter to
-	 * cover all applicable quotas in a single scan.
-	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
-
-	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_uid = VFS_I(ip)->i_uid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
-			do_work = true;
-		}
-	}
-
-	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_gid = VFS_I(ip)->i_gid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
-			do_work = true;
-		}
-	}
-
-	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_prid = ip->i_d.di_projid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
-			do_work = true;
-		}
-	}
-
-	if (!do_work)
-		return false;
-
-	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
-	return true;
-}
-
 static inline unsigned long
 xfs_iflag_for_tag(
 	int		tag)
@@ -1699,3 +1644,58 @@ xfs_start_block_reaping(
 	xfs_queue_eofblocks(mp);
 	xfs_queue_cowblocks(mp);
 }
+
+/*
+ * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
+ * with multiple quotas, we don't know exactly which quota caused an allocation
+ * failure. We make a best effort by including each quota under low free space
+ * conditions (less than 1% free space) in the scan.
+ */
+bool
+xfs_blockgc_free_quota(
+	struct xfs_inode	*ip)
+{
+	struct xfs_eofblocks	eofb = {0};
+	struct xfs_dquot	*dq;
+	bool			do_work = false;
+
+	/*
+	 * Run a sync scan to increase effectiveness and use the union filter to
+	 * cover all applicable quotas in a single scan.
+	 */
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
+
+	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
+		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
+		if (dq && xfs_dquot_lowsp(dq)) {
+			eofb.eof_uid = VFS_I(ip)->i_uid;
+			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
+			do_work = true;
+		}
+	}
+
+	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
+		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
+		if (dq && xfs_dquot_lowsp(dq)) {
+			eofb.eof_gid = VFS_I(ip)->i_gid;
+			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
+			do_work = true;
+		}
+	}
+
+	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
+		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
+		if (dq && xfs_dquot_lowsp(dq)) {
+			eofb.eof_prid = ip->i_d.di_projid;
+			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+			do_work = true;
+		}
+	}
+
+	if (!do_work)
+		return false;
+
+	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+	return true;
+}
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 3f7ddbca8638..21b726a05b0d 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,7 +54,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
-bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
+bool xfs_blockgc_free_quota(struct xfs_inode *ip);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-24  9:34   ` Christoph Hellwig
                     ` (2 more replies)
  2021-01-23 18:52 ` [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks Darrick J. Wong
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Change the signature of xfs_blockgc_free_quota in preparation for the
next few patches.  Callers can now pass EOF_FLAGS into the function to
control scan parameters; and the function will now pass back any
corruption errors seen while scanning, though for our retry loops we'll
just try again unconditionally.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c   |    7 +++----
 fs/xfs/xfs_icache.c |   20 ++++++++++++--------
 fs/xfs/xfs_icache.h |    2 +-
 3 files changed, 16 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index d69e5abcc1b4..57086838a676 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -747,10 +747,9 @@ xfs_file_buffered_write(
 	 */
 	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		cleared_space = xfs_blockgc_free_quota(ip);
-		if (cleared_space)
-			goto write_retry;
-		iolock = 0;
+		xfs_blockgc_free_quota(ip, XFS_EOF_FLAGS_SYNC);
+		cleared_space = true;
+		goto write_retry;
 	} else if (ret == -ENOSPC && !cleared_space) {
 		struct xfs_eofblocks eofb = {0};
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index aba901d5637b..68b6f72593dc 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1651,19 +1651,21 @@ xfs_start_block_reaping(
  * failure. We make a best effort by including each quota under low free space
  * conditions (less than 1% free space) in the scan.
  */
-bool
+int
 xfs_blockgc_free_quota(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	unsigned int		eof_flags)
 {
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_dquot	*dq;
 	bool			do_work = false;
+	int			error;
 
 	/*
-	 * Run a sync scan to increase effectiveness and use the union filter to
+	 * Run a scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION | eof_flags;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
 		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
@@ -1693,9 +1695,11 @@ xfs_blockgc_free_quota(
 	}
 
 	if (!do_work)
-		return false;
+		return 0;
 
-	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
-	return true;
+	error = xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	if (error)
+		return error;
+
+	return xfs_icache_free_cowblocks(ip->i_mount, &eofb);
 }
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 21b726a05b0d..d64ea8f5c589 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,7 +54,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
-bool xfs_blockgc_free_quota(struct xfs_inode *ip);
+int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-24  9:39   ` Christoph Hellwig
  2021-01-26  4:53   ` [PATCH v4.1 " Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation Darrick J. Wong
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (data write, reflink, xattr set, fallocate, etc.)
is unable to reserve enough quota to handle the modification, try
clearing whatever space the filesystem might have been hanging onto in
the hopes of speeding up the filesystem.  The flushing behavior will
become particularly important when we add deferred inode inactivation
because that will increase the amount of space that isn't actively tied
to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |    8 +++++-
 fs/xfs/libxfs/xfs_bmap.c |    8 +++++-
 fs/xfs/xfs_bmap_util.c   |   17 ++++++++++----
 fs/xfs/xfs_iomap.c       |   19 ++++++++++++---
 fs/xfs/xfs_quota.h       |   20 ++++++++++------
 fs/xfs/xfs_reflink.c     |   16 ++++++++++---
 fs/xfs/xfs_trans_dquot.c |   57 ++++++++++++++++++++++++++++++++++++----------
 7 files changed, 108 insertions(+), 37 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index be51e7068dcd..af835ea0ca80 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -395,6 +395,7 @@ xfs_attr_set(
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
 	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			quota_retry = false;
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
@@ -458,6 +459,7 @@ xfs_attr_set(
 	 * Root fork attributes can use reserved data blocks for this
 	 * operation if necessary
 	 */
+retry:
 	error = xfs_trans_alloc(mp, &tres, total, 0,
 			rsvd ? XFS_TRANS_RESERVE : 0, &args->trans);
 	if (error)
@@ -478,10 +480,12 @@ xfs_attr_set(
 
 		if (rsvd)
 			quota_flags |= XFS_QMOPT_FORCE_RES;
-		error = xfs_trans_reserve_quota_nblks(args->trans, dp,
-				args->total, 0, quota_flags);
+		error = xfs_trans_reserve_quota_nblks(&args->trans, dp,
+				args->total, 0, quota_flags, &quota_retry);
 		if (error)
 			goto out_trans_cancel;
+		if (quota_retry)
+			goto retry;
 
 		error = xfs_has_attr(args);
 		if (error == -EEXIST && (args->attr_flags & XATTR_CREATE))
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 908b7d49da60..0247763dfac3 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1070,6 +1070,7 @@ xfs_bmap_add_attrfork(
 	int			blks;		/* space reservation */
 	int			version = 1;	/* superblock attr version */
 	int			logflags;	/* logging flags */
+	bool			quota_retry = false;
 	int			error;		/* error return value */
 
 	ASSERT(XFS_IFORK_Q(ip) == 0);
@@ -1079,17 +1080,20 @@ xfs_bmap_add_attrfork(
 
 	blks = XFS_ADDAFORK_SPACE_RES(mp);
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_addafork, blks, 0,
 			rsvd ? XFS_TRANS_RESERVE : 0, &tp);
 	if (error)
 		return error;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	error = xfs_trans_reserve_quota_nblks(tp, ip, blks, 0, rsvd ?
+	error = xfs_trans_reserve_quota_nblks(&tp, ip, blks, 0, rsvd ?
 			XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES :
-			XFS_QMOPT_RES_REGBLKS);
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto trans_cancel;
+	if (quota_retry)
+		goto retry;
 	if (XFS_IFORK_Q(ip))
 		goto trans_cancel;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 792809debaaa..6eaf92bf8fc6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -761,6 +761,7 @@ xfs_alloc_file_space(
 	 */
 	while (allocatesize_fsb && !error) {
 		xfs_fileoff_t	s, e;
+		bool		quota_retry = false;
 
 		/*
 		 * Determine space reservations for data/realtime.
@@ -803,6 +804,7 @@ xfs_alloc_file_space(
 		/*
 		 * Allocate and setup the transaction.
 		 */
+retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
 				resrtextents, 0, &tp);
 
@@ -817,10 +819,12 @@ xfs_alloc_file_space(
 			break;
 		}
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
-						      0, quota_flag);
+		error = xfs_trans_reserve_quota_nblks(&tp, ip, qblocks, 0,
+				quota_flag, &quota_retry);
 		if (error)
 			goto error1;
+		if (quota_retry)
+			goto retry;
 
 		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
@@ -858,7 +862,6 @@ xfs_alloc_file_space(
 
 error0:	/* unlock inode, unreserve quota blocks, cancel trans */
 	xfs_trans_unreserve_quota_nblks(tp, ip, (long)qblocks, 0, quota_flag);
-
 error1:	/* Just cancel transaction */
 	xfs_trans_cancel(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -875,8 +878,10 @@ xfs_unmap_extent(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+	bool			quota_retry = false;
 	int			error;
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	if (error) {
 		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
@@ -884,10 +889,12 @@ xfs_unmap_extent(
 	}
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-			XFS_QMOPT_RES_REGBLKS);
+	error = xfs_trans_reserve_quota_nblks(&tp, ip, resblks, 0,
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	xfs_trans_ijoin(tp, ip, 0);
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 514e6ae010e0..294d819c30c6 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -27,7 +27,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
-
+#include "xfs_icache.h"
 
 #define XFS_ALLOC_ALIGN(mp, off) \
 	(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -197,6 +197,7 @@ xfs_iomap_write_direct(
 	int			quota_flag;
 	uint			qblocks, resblks;
 	unsigned int		resrtextents = 0;
+	bool			quota_retry = false;
 	int			error;
 	int			bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint			tflags = 0;
@@ -239,6 +240,7 @@ xfs_iomap_write_direct(
 			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
 		}
 	}
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, resrtextents,
 			tflags, &tp);
 	if (error)
@@ -246,9 +248,12 @@ xfs_iomap_write_direct(
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
-	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);
+	error = xfs_trans_reserve_quota_nblks(&tp, ip, qblocks, 0, quota_flag,
+			&quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
@@ -544,6 +549,8 @@ xfs_iomap_write_unwritten(
 		return error;
 
 	do {
+		bool	quota_retry = false;
+
 		/*
 		 * Set up a transaction to convert the range of extents
 		 * from unwritten to real. Do allocations in a loop until
@@ -553,6 +560,7 @@ xfs_iomap_write_unwritten(
 		 * here as we might be asked to write out the same inode that we
 		 * complete here and might deadlock on the iolock.
 		 */
+retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
 				XFS_TRANS_RESERVE, &tp);
 		if (error)
@@ -561,10 +569,13 @@ xfs_iomap_write_unwritten(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES);
+		error = xfs_trans_reserve_quota_nblks(&tp, ip, resblks, 0,
+				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES,
+				&quota_retry);
 		if (error)
 			goto error_on_bmapi_transaction;
+		if (quota_retry)
+			goto retry;
 
 		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 				XFS_IEXT_WRITE_UNWRITTEN_CNT);
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 16a2e7adf4da..1c083b5267d9 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -81,8 +81,9 @@ extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *,
 		uint, int64_t);
 extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *);
 extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *);
-extern int xfs_trans_reserve_quota_nblks(struct xfs_trans *,
-		struct xfs_inode *, int64_t, long, uint);
+int xfs_trans_reserve_quota_nblks(struct xfs_trans **tpp, struct xfs_inode *ip,
+		int64_t nblocks, long ninos, unsigned int flags,
+		bool *retry);
 extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
 		struct xfs_mount *, struct xfs_dquot *,
 		struct xfs_dquot *, struct xfs_dquot *, int64_t, long, uint);
@@ -114,8 +115,11 @@ extern void xfs_qm_unmount_quotas(struct xfs_mount *);
 static inline int
 xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t nblks, bool isrt)
 {
-	return xfs_trans_reserve_quota_nblks(NULL, ip, nblks, 0,
-			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS);
+	struct xfs_trans	*tp = NULL;
+
+	return xfs_trans_reserve_quota_nblks(&tp, ip, nblks, 0,
+			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS,
+			NULL);
 }
 #else
 static inline int
@@ -133,8 +137,9 @@ xfs_qm_vop_dqalloc(struct xfs_inode *ip, kuid_t kuid, kgid_t kgid,
 #define xfs_trans_mod_dquot_byino(tp, ip, fields, delta)
 #define xfs_trans_apply_dquot_deltas(tp)
 #define xfs_trans_unreserve_and_mod_dquots(tp)
-static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp,
-		struct xfs_inode *ip, int64_t nblks, long ninos, uint flags)
+static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans **tpp,
+		struct xfs_inode *ip, int64_t nblks, long ninos,
+		unsigned int flags, bool *retry)
 {
 	return 0;
 }
@@ -179,7 +184,8 @@ static inline int
 xfs_trans_unreserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip,
 		int64_t nblks, long ninos, unsigned int flags)
 {
-	return xfs_trans_reserve_quota_nblks(tp, ip, -nblks, -ninos, flags);
+	return xfs_trans_reserve_quota_nblks(&tp, ip, -nblks, -ninos, flags,
+			NULL);
 }
 
 static inline int
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0da1a603b7d8..0afd74f35ab7 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -355,6 +355,7 @@ xfs_reflink_allocate_cow(
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
 	struct xfs_trans	*tp;
 	int			nimaps, error = 0;
+	bool			quota_retry = false;
 	bool			found;
 	xfs_filblks_t		resaligned;
 	xfs_extlen_t		resblks = 0;
@@ -376,6 +377,7 @@ xfs_reflink_allocate_cow(
 	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
 
 	xfs_iunlock(ip, *lockmode);
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	*lockmode = XFS_ILOCK_EXCL;
 	xfs_ilock(ip, *lockmode);
@@ -398,10 +400,12 @@ xfs_reflink_allocate_cow(
 		goto convert;
 	}
 
-	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-			XFS_QMOPT_RES_REGBLKS);
+	error = xfs_trans_reserve_quota_nblks(&tp, ip, resblks, 0,
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	xfs_trans_ijoin(tp, ip, 0);
 
@@ -1006,10 +1010,12 @@ xfs_reflink_remap_extent(
 	unsigned int		resblks;
 	bool			smap_real;
 	bool			dmap_written = xfs_bmap_is_written_extent(dmap);
+	bool			quota_retry = false;
 	int			iext_delta = 0;
 	int			nimaps;
 	int			error;
 
+retry:
 	/* Start a rolling transaction to switch the mappings */
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
@@ -1094,10 +1100,12 @@ xfs_reflink_remap_extent(
 	if (!smap_real && dmap_written)
 		qres += dmap->br_blockcount;
 	if (qres > 0) {
-		error = xfs_trans_reserve_quota_nblks(tp, ip, qres, 0,
-				XFS_QMOPT_RES_REGBLKS);
+		error = xfs_trans_reserve_quota_nblks(&tp, ip, qres, 0,
+				XFS_QMOPT_RES_REGBLKS, &quota_retry);
 		if (error)
 			goto out_cancel;
+		if (quota_retry)
+			goto retry;
 	}
 
 	if (smap_real)
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 3315498a6fa1..adc7331ff182 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -16,6 +16,7 @@
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_trace.h"
+#include "xfs_icache.h"
 
 STATIC void	xfs_trans_alloc_dqinfo(xfs_trans_t *);
 
@@ -770,21 +771,38 @@ xfs_trans_reserve_quota_bydquots(
 	return error;
 }
 
-
 /*
- * Lock the dquot and change the reservation if we can.
- * This doesn't change the actual usage, just the reservation.
- * The inode sent in is locked.
+ * Lock the dquot and change the reservation if we can.  This doesn't change
+ * the actual usage, just the reservation.  The caller must hold ILOCK_EXCL on
+ * the inode.  If @retry is not a NULL pointer, the caller must ensure that
+ * *retry is set to false before the first time this function is called.
+ *
+ * If the quota reservation fails because we hit a quota limit (and retry is
+ * not a NULL pointer, and *retry is false), this function will try to invoke
+ * the speculative preallocation gc scanner to reduce quota usage.  In order to
+ * do that, we cancel the transaction, NULL out tpp, drop the ILOCK, and set
+ * *retry to true.
+ *
+ * This function ends one of two ways:
+ *
+ *  1) To signal the caller to try again, *retry is set to true; *tpp is
+ *     cancelled and set to NULL; the inode is unlocked; and the return value
+ *     is zero.
+ *
+ *  2) Otherwise, *tpp is still set; the inode is still locked; and the return
+ *     value is zero or the usual negative error code.
  */
 int
 xfs_trans_reserve_quota_nblks(
-	struct xfs_trans	*tp,
+	struct xfs_trans	**tpp,
 	struct xfs_inode	*ip,
 	int64_t			nblks,
 	long			ninos,
-	uint			flags)
+	unsigned int		flags,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
 
 	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
 		return 0;
@@ -795,13 +813,26 @@ xfs_trans_reserve_quota_nblks(
 	ASSERT((flags & ~(XFS_QMOPT_FORCE_RES)) == XFS_TRANS_DQ_RES_RTBLKS ||
 	       (flags & ~(XFS_QMOPT_FORCE_RES)) == XFS_TRANS_DQ_RES_BLKS);
 
-	/*
-	 * Reserve nblks against these dquots, with trans as the mediator.
-	 */
-	return xfs_trans_reserve_quota_bydquots(tp, mp,
-						ip->i_udquot, ip->i_gdquot,
-						ip->i_pdquot,
-						nblks, ninos, flags);
+	/* Reserve nblks against these dquots, with trans as the mediator. */
+	error = xfs_trans_reserve_quota_bydquots(*tpp, mp, ip->i_udquot,
+			ip->i_gdquot, ip->i_pdquot, nblks, ninos, flags);
+	if (retry == NULL)
+		return error;
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	/* Release resources, prepare for scan. */
+	xfs_trans_cancel(*tpp);
+	*tpp = NULL;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/* Try to free some quota for this file's dquots. */
+	*retry = true;
+	xfs_blockgc_free_quota(ip, 0);
+	return 0;
 }
 
 /* Change the quota reservations for an inode creation activity. */


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (5 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-26  4:55   ` [PATCH v4.1 " Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown Darrick J. Wong
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

If an inode creation is unable to reserve enough quota to handle the
modification, try clearing whatever space the filesystem might have been
hanging onto in the hopes of speeding up the filesystem.  The flushing
behavior will become particularly important when we add deferred inode
inactivation because that will increase the amount of space that isn't
actively tied to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c      |   78 ++++++++++++++++++++++++++++------------------
 fs/xfs/xfs_icache.h      |    2 +
 fs/xfs/xfs_inode.c       |   17 ++++++++--
 fs/xfs/xfs_quota.h       |   13 ++++----
 fs/xfs/xfs_symlink.c     |    8 ++++-
 fs/xfs/xfs_trans_dquot.c |   52 ++++++++++++++++++++++++++++---
 6 files changed, 124 insertions(+), 46 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 68b6f72593dc..7f999f9dd80a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1646,60 +1646,78 @@ xfs_start_block_reaping(
 }
 
 /*
- * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
- * with multiple quotas, we don't know exactly which quota caused an allocation
- * failure. We make a best effort by including each quota under low free space
- * conditions (less than 1% free space) in the scan.
+ * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
+ * quota caused an allocation failure, so we make a best effort by including
+ * each quota under low free space conditions (less than 1% free space) in the
+ * scan.
  */
 int
-xfs_blockgc_free_quota(
-	struct xfs_inode	*ip,
+xfs_blockgc_free_dquots(
+	struct xfs_dquot	*udqp,
+	struct xfs_dquot	*gdqp,
+	struct xfs_dquot	*pdqp,
 	unsigned int		eof_flags)
 {
 	struct xfs_eofblocks	eofb = {0};
-	struct xfs_dquot	*dq;
+	struct xfs_mount	*mp = NULL;
 	bool			do_work = false;
 	int			error;
 
+	if (!udqp && !gdqp && !pdqp)
+		return 0;
+	if (udqp)
+		mp = udqp->q_mount;
+	if (!mp && gdqp)
+		mp = gdqp->q_mount;
+	if (!mp && pdqp)
+		mp = pdqp->q_mount;
+
 	/*
 	 * Run a scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
 	eofb.eof_flags = XFS_EOF_FLAGS_UNION | eof_flags;
 
-	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_uid = VFS_I(ip)->i_uid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
-			do_work = true;
-		}
+	if (XFS_IS_UQUOTA_ENFORCED(mp) && udqp && xfs_dquot_lowsp(udqp)) {
+		eofb.eof_uid = make_kuid(mp->m_super->s_user_ns, udqp->q_id);
+		eofb.eof_flags |= XFS_EOF_FLAGS_UID;
+		do_work = true;
 	}
 
-	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_gid = VFS_I(ip)->i_gid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
-			do_work = true;
-		}
+	if (XFS_IS_UQUOTA_ENFORCED(mp) && gdqp && xfs_dquot_lowsp(gdqp)) {
+		eofb.eof_gid = make_kgid(mp->m_super->s_user_ns, gdqp->q_id);
+		eofb.eof_flags |= XFS_EOF_FLAGS_GID;
+		do_work = true;
 	}
 
-	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_prid = ip->i_d.di_projid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
-			do_work = true;
-		}
+	if (XFS_IS_PQUOTA_ENFORCED(mp) && pdqp && xfs_dquot_lowsp(pdqp)) {
+		eofb.eof_prid = pdqp->q_id;
+		eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+		do_work = true;
 	}
 
 	if (!do_work)
 		return 0;
 
-	error = xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	error = xfs_icache_free_eofblocks(mp, &eofb);
 	if (error)
 		return error;
 
-	return xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+	return xfs_icache_free_cowblocks(mp, &eofb);
+}
+
+/*
+ * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
+ * with multiple quotas, we don't know exactly which quota caused an allocation
+ * failure. We make a best effort by including each quota under low free space
+ * conditions (less than 1% free space) in the scan.
+ */
+int
+xfs_blockgc_free_quota(
+	struct xfs_inode	*ip,
+	unsigned int		eof_flags)
+{
+	return xfs_blockgc_free_dquots(xfs_inode_dquot(ip, XFS_DQTYPE_USER),
+			xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
+			xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), eof_flags);
 }
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index d64ea8f5c589..5f520de637f6 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,6 +54,8 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
+int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, unsigned int eof_flags);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e909da05cd28..a3072c3f5028 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -990,6 +990,7 @@ xfs_create(
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_trans_res	*tres;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	trace_xfs_create(dp, name);
@@ -1022,6 +1023,7 @@ xfs_create(
 	 * the case we'll drop the one we have and get a more
 	 * appropriate transaction later.
 	 */
+retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
 	if (error == -ENOSPC) {
 		/* flush outstanding delalloc blocks and retry */
@@ -1037,10 +1039,12 @@ xfs_create(
 	/*
 	 * Reserve disk quota and the inode.
 	 */
-	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+	error = xfs_trans_reserve_quota_icreate(&tp, dp, &unlock_dp_on_error,
+			udqp, gdqp, pdqp, resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
 			XFS_IEXT_DIR_MANIP_CNT(mp));
@@ -1146,6 +1150,8 @@ xfs_create_tmpfile(
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_trans_res	*tres;
+	bool			locked = false;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -1165,14 +1171,17 @@ xfs_create_tmpfile(
 	resblks = XFS_IALLOC_SPACE_RES(mp);
 	tres = &M_RES(mp)->tr_create_tmpfile;
 
+retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
 	if (error)
 		goto out_release_inode;
 
-	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+	error = xfs_trans_reserve_quota_icreate(&tp, dp, &locked, udqp, gdqp,
+			pdqp, resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	error = xfs_dir_ialloc(&tp, dp, mode, 0, 0, prid, &ip);
 	if (error)
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 1c083b5267d9..c4d02252e36f 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -87,9 +87,10 @@ int xfs_trans_reserve_quota_nblks(struct xfs_trans **tpp, struct xfs_inode *ip,
 extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
 		struct xfs_mount *, struct xfs_dquot *,
 		struct xfs_dquot *, struct xfs_dquot *, int64_t, long, uint);
-int xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
-		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
-		struct xfs_dquot *pdqp, int64_t nblks);
+int xfs_trans_reserve_quota_icreate(struct xfs_trans **tpp,
+		struct xfs_inode *dp, bool *dp_locked, struct xfs_dquot *udqp,
+		struct xfs_dquot *gdqp, struct xfs_dquot *pdqp, int64_t nblks,
+		bool *retry);
 
 extern int xfs_qm_vop_dqalloc(struct xfs_inode *, kuid_t, kgid_t,
 		prid_t, uint, struct xfs_dquot **, struct xfs_dquot **,
@@ -158,9 +159,9 @@ xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t nblks, bool isrt)
 }
 
 static inline int
-xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
-		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
-		struct xfs_dquot *pdqp, int64_t nblks)
+xfs_trans_reserve_quota_icreate(struct xfs_trans **tpp, struct xfs_inode *dp,
+		bool *dp_locked, struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, int64_t nblks, bool *retry)
 {
 	return 0;
 }
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index f8bfa51bdeef..20c150ad699f 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -159,6 +159,7 @@ xfs_symlink(
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	*ipp = NULL;
@@ -197,6 +198,7 @@ xfs_symlink(
 		fs_blocks = xfs_symlink_blocks(mp, pathlen);
 	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_symlink, resblks, 0, 0, &tp);
 	if (error)
 		goto out_release_inode;
@@ -215,10 +217,12 @@ xfs_symlink(
 	/*
 	 * Reserve disk quota : blocks and inode.
 	 */
-	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+	error = xfs_trans_reserve_quota_icreate(&tp, dp, &unlock_dp_on_error,
+			udqp, gdqp, pdqp, resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry)
+		goto retry;
 
 	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
 			XFS_IEXT_DIR_MANIP_CNT(mp));
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index adc7331ff182..340c066f8ef1 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -835,25 +835,69 @@ xfs_trans_reserve_quota_nblks(
 	return 0;
 }
 
-/* Change the quota reservations for an inode creation activity. */
+/*
+ * Change the quota reservations for an inode creation activity.  This doesn't
+ * change the actual usage, just the reservation.  The caller may hold
+ * ILOCK_EXCL on the inode.  If @retry is not a NULL pointer, the caller must
+ * ensure that *retry is set to false before the first time this function is
+ * called.
+ *
+ * If the quota reservation fails because we hit a quota limit (and retry is
+ * not a NULL pointer, and *retry is false), this function will try to invoke
+ * the speculative preallocation gc scanner to reduce quota usage.  In order to
+ * do that, we cancel the transaction, NULL out tpp, drop the ILOCK and update
+ * *dp_locked if dp_locked is not NULL, and set *retry to true.
+ *
+ * This function ends one of two ways:
+ *
+ *  1) To signal the caller to try again, *retry is set to true; *tpp is
+ *     cancelled and set to NULL; if *dp_locked is true, the inode is unlocked
+ *     and *dp_locked is set to false; and the return value is zero.
+ *
+ *  2) Otherwise, *tpp is still set; the inode is still locked; and the return
+ *     value is zero or the usual negative error code.
+ */
 int
 xfs_trans_reserve_quota_icreate(
-	struct xfs_trans	*tp,
+	struct xfs_trans	**tpp,
 	struct xfs_inode	*dp,
+	bool			*dp_locked,
 	struct xfs_dquot	*udqp,
 	struct xfs_dquot	*gdqp,
 	struct xfs_dquot	*pdqp,
-	int64_t			nblks)
+	int64_t			nblks,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = dp->i_mount;
+	int			error;
 
 	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
 		return 0;
 
 	ASSERT(!xfs_is_quota_inode(&mp->m_sb, dp->i_ino));
 
-	return xfs_trans_reserve_quota_bydquots(tp, dp->i_mount, udqp, gdqp,
+	error = xfs_trans_reserve_quota_bydquots(*tpp, dp->i_mount, udqp, gdqp,
 			pdqp, nblks, 1, XFS_QMOPT_RES_REGBLKS);
+	if (retry == NULL)
+		return error;
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	/* Release resources, prepare for scan. */
+	xfs_trans_cancel(*tpp);
+	*tpp = NULL;
+	if (*dp_locked) {
+		xfs_iunlock(dp, XFS_ILOCK_EXCL);
+		*dp_locked = false;
+	}
+
+	/* Try to free some quota for the supplied dquots. */
+	*retry = true;
+	xfs_blockgc_free_dquots(udqp, gdqp, pdqp, 0);
+	return 0;
 }
 
 /*


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (6 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-26  4:55   ` [PATCH v4.1 " Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 09/11] xfs: add a tracepoint for blockgc scans Darrick J. Wong
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

If a file user, group, or project change is unable to reserve enough
quota to handle the modification, try clearing whatever space the
filesystem might have been hanging onto in the hopes of speeding up the
filesystem.  The flushing behavior will become particularly important
when we add deferred inode inactivation because that will increase the
amount of space that isn't actively tied to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_ioctl.c |   12 ++++++++++--
 fs/xfs/xfs_iops.c  |   13 ++++++++++---
 fs/xfs/xfs_qm.c    |   34 ++++++++++++++++++++++++++--------
 fs/xfs/xfs_quota.h |    8 ++++----
 4 files changed, 50 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f61ea5..952eca338807 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1436,6 +1436,7 @@ xfs_ioctl_setattr(
 	struct xfs_trans	*tp;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
+	bool			quota_retry = false;
 	int			code;
 
 	trace_xfs_ioctl_setattr(ip);
@@ -1462,6 +1463,7 @@ xfs_ioctl_setattr(
 
 	xfs_ioctl_setattr_prepare_dax(ip, fa);
 
+retry:
 	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
@@ -1470,10 +1472,16 @@ xfs_ioctl_setattr(
 
 	if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_PQUOTA_ON(mp) &&
 	    ip->i_d.di_projid != fa->fsx_projid) {
-		code = xfs_qm_vop_chown_reserve(tp, ip, NULL, NULL, pdqp,
-				capable(CAP_FOWNER) ?  XFS_QMOPT_FORCE_RES : 0);
+		unsigned int	flags = 0;
+
+		if (capable(CAP_FOWNER))
+			flags |= XFS_QMOPT_FORCE_RES;
+		code = xfs_qm_vop_chown_reserve(&tp, ip, NULL, NULL, pdqp,
+				flags, &quota_retry);
 		if (code)	/* out of quota */
 			goto error_trans_cancel;
+		if (quota_retry)
+			goto retry;
 	}
 
 	xfs_fill_fsxattr(ip, false, &old_fa);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f1e21b6cfa48..d43c2b008be8 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -660,6 +660,7 @@ xfs_setattr_nonsize(
 	kgid_t			gid = GLOBAL_ROOT_GID, igid = GLOBAL_ROOT_GID;
 	struct xfs_dquot	*udqp = NULL, *gdqp = NULL;
 	struct xfs_dquot	*olddquot1 = NULL, *olddquot2 = NULL;
+	bool			quota_retry = false;
 
 	ASSERT((mask & ATTR_SIZE) == 0);
 
@@ -700,6 +701,7 @@ xfs_setattr_nonsize(
 			return error;
 	}
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
 	if (error)
 		goto out_dqrele;
@@ -729,12 +731,17 @@ xfs_setattr_nonsize(
 		if (XFS_IS_QUOTA_RUNNING(mp) &&
 		    ((XFS_IS_UQUOTA_ON(mp) && !uid_eq(iuid, uid)) ||
 		     (XFS_IS_GQUOTA_ON(mp) && !gid_eq(igid, gid)))) {
+			unsigned int	flags = 0;
+
+			if (capable(CAP_FOWNER))
+				flags |= XFS_QMOPT_FORCE_RES;
 			ASSERT(tp);
-			error = xfs_qm_vop_chown_reserve(tp, ip, udqp, gdqp,
-						NULL, capable(CAP_FOWNER) ?
-						XFS_QMOPT_FORCE_RES : 0);
+			error = xfs_qm_vop_chown_reserve(&tp, ip, udqp, gdqp,
+					NULL, flags, &quota_retry);
 			if (error)	/* out of quota */
 				goto out_cancel;
+			if (quota_retry)
+				goto retry;
 		}
 
 		/*
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index c134eb4aeaa8..3b481f69a913 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1795,24 +1795,26 @@ xfs_qm_vop_chown(
 }
 
 /*
- * Quota reservations for setattr(AT_UID|AT_GID|AT_PROJID).
+ * Quota reservations for setattr(AT_UID|AT_GID|AT_PROJID).  This function has
+ * the same return behavior as xfs_trans_reserve_quota_nblks.
  */
 int
 xfs_qm_vop_chown_reserve(
-	struct xfs_trans	*tp,
+	struct xfs_trans	**tpp,
 	struct xfs_inode	*ip,
 	struct xfs_dquot	*udqp,
 	struct xfs_dquot	*gdqp,
 	struct xfs_dquot	*pdqp,
-	uint			flags)
+	unsigned int		flags,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	uint64_t		delblks;
 	unsigned int		blkflags;
-	struct xfs_dquot	*udq_unres = NULL;
+	struct xfs_dquot	*udq_unres = NULL; /* old dquots */
 	struct xfs_dquot	*gdq_unres = NULL;
 	struct xfs_dquot	*pdq_unres = NULL;
-	struct xfs_dquot	*udq_delblks = NULL;
+	struct xfs_dquot	*udq_delblks = NULL; /* new dquots */
 	struct xfs_dquot	*gdq_delblks = NULL;
 	struct xfs_dquot	*pdq_delblks = NULL;
 	int			error;
@@ -1856,11 +1858,11 @@ xfs_qm_vop_chown_reserve(
 		}
 	}
 
-	error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount,
+	error = xfs_trans_reserve_quota_bydquots(*tpp, ip->i_mount,
 				udq_delblks, gdq_delblks, pdq_delblks,
 				ip->i_d.di_nblocks, 1, flags | blkflags);
 	if (error)
-		return error;
+		goto err;
 
 	/*
 	 * Do the delayed blks reservations/unreservations now. Since, these
@@ -1878,12 +1880,28 @@ xfs_qm_vop_chown_reserve(
 			    udq_delblks, gdq_delblks, pdq_delblks,
 			    (xfs_qcnt_t)delblks, 0, flags | blkflags);
 		if (error)
-			return error;
+			goto err;
 		xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
 				udq_unres, gdq_unres, pdq_unres,
 				-((xfs_qcnt_t)delblks), 0, blkflags);
 	}
 
+	return 0;
+err:
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	/* Release resources, prepare for scan. */
+	xfs_trans_cancel(*tpp);
+	*tpp = NULL;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/* Try to free some quota in the new dquots. */
+	*retry = true;
+	xfs_blockgc_free_dquots(udq_delblks, gdq_delblks, pdq_delblks, 0);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index c4d02252e36f..ce65f6ac57a9 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -100,9 +100,9 @@ extern void xfs_qm_vop_create_dqattach(struct xfs_trans *, struct xfs_inode *,
 extern int xfs_qm_vop_rename_dqattach(struct xfs_inode **);
 extern struct xfs_dquot *xfs_qm_vop_chown(struct xfs_trans *,
 		struct xfs_inode *, struct xfs_dquot **, struct xfs_dquot *);
-extern int xfs_qm_vop_chown_reserve(struct xfs_trans *, struct xfs_inode *,
-		struct xfs_dquot *, struct xfs_dquot *,
-		struct xfs_dquot *, uint);
+int xfs_qm_vop_chown_reserve(struct xfs_trans **tpp, struct xfs_inode *ip,
+		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, unsigned int flags, bool *retry);
 extern int xfs_qm_dqattach(struct xfs_inode *);
 extern int xfs_qm_dqattach_locked(struct xfs_inode *ip, bool doalloc);
 extern void xfs_qm_dqdetach(struct xfs_inode *);
@@ -169,7 +169,7 @@ xfs_trans_reserve_quota_icreate(struct xfs_trans **tpp, struct xfs_inode *dp,
 #define xfs_qm_vop_create_dqattach(tp, ip, u, g, p)
 #define xfs_qm_vop_rename_dqattach(it)					(0)
 #define xfs_qm_vop_chown(tp, ip, old, new)				(NULL)
-#define xfs_qm_vop_chown_reserve(tp, ip, u, g, p, fl)			(0)
+#define xfs_qm_vop_chown_reserve(tpp, ip, u, g, p, fl, retry)		(0)
 #define xfs_qm_dqattach(ip)						(0)
 #define xfs_qm_dqattach_locked(ip, fl)					(0)
 #define xfs_qm_dqdetach(ip)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 09/11] xfs: add a tracepoint for blockgc scans
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (7 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-25 18:45   ` Brian Foster
  2021-01-26  4:56   ` [PATCH v4.1 " Darrick J. Wong
  2021-01-23 18:52 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
  2021-01-23 18:53 ` [PATCH 11/11] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
  10 siblings, 2 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Add some tracepoints so that we can observe when the speculative
preallocation garbage collector runs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_ioctl.c |    2 ++
 fs/xfs/xfs_trace.c |    1 +
 fs/xfs/xfs_trace.h |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 952eca338807..da407934364c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -2356,6 +2356,8 @@ xfs_file_ioctl(
 		if (error)
 			return error;
 
+		trace_xfs_ioc_free_eofblocks(mp, &keofb, _RET_IP_);
+
 		sb_start_write(mp->m_super);
 		error = xfs_icache_free_eofblocks(mp, &keofb);
 		sb_end_write(mp->m_super);
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index 120398a37c2a..9b8d703dc9fd 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -29,6 +29,7 @@
 #include "xfs_filestream.h"
 #include "xfs_fsmap.h"
 #include "xfs_btree_staging.h"
+#include "xfs_icache.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 407c3a5208ab..4cbf446bae9a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -37,6 +37,7 @@ struct xfs_trans_res;
 struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
+struct xfs_eofblocks;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -3888,6 +3889,44 @@ DEFINE_EVENT(xfs_timestamp_range_class, name, \
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_inode_timestamp_range);
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_quota_expiry_range);
 
+DECLARE_EVENT_CLASS(xfs_eofblocks_class,
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, eofb, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(__u32, flags)
+		__field(uint32_t, uid)
+		__field(uint32_t, gid)
+		__field(prid_t, prid)
+		__field(__u64, min_file_size)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->flags = eofb->eof_flags;
+		__entry->uid = from_kuid(mp->m_super->s_user_ns, eofb->eof_uid);
+		__entry->gid = from_kgid(mp->m_super->s_user_ns, eofb->eof_gid);
+		__entry->prid = eofb->eof_prid;
+		__entry->min_file_size = eofb->eof_min_file_size;
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d flags 0x%x uid %u gid %u prid %u minsize %llu caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->flags,
+		  __entry->uid,
+		  __entry->gid,
+		  __entry->prid,
+		  __entry->min_file_size,
+		  (char *)__entry->caller_ip)
+);
+#define DEFINE_EOFBLOCKS_EVENT(name)	\
+DEFINE_EVENT(xfs_eofblocks_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, eofb, caller_ip))
+DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (8 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 09/11] xfs: add a tracepoint for blockgc scans Darrick J. Wong
@ 2021-01-23 18:52 ` Darrick J. Wong
  2021-01-24  9:41   ` Christoph Hellwig
  2021-01-25 18:46   ` Brian Foster
  2021-01-23 18:53 ` [PATCH 11/11] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
  10 siblings, 2 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:52 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

In anticipation of more restructuring of the eof/cowblocks gc code,
refactor calling of those two functions into a single internal helper
function, then present a new standard interface to purge speculative
block preallocations and start shifting higher level code to use that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c   |    3 +--
 fs/xfs/xfs_icache.c |   39 +++++++++++++++++++++++++++++++++------
 fs/xfs/xfs_icache.h |    1 +
 fs/xfs/xfs_trace.h  |    1 +
 4 files changed, 36 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 57086838a676..a766ad4477c5 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -758,8 +758,7 @@ xfs_file_buffered_write(
 
 		xfs_iunlock(ip, iolock);
 		eofb.eof_flags = XFS_EOF_FLAGS_SYNC;
-		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-		xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+		xfs_blockgc_free_space(ip->i_mount, &eofb);
 		goto write_retry;
 	}
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 7f999f9dd80a..0d228a5e879f 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1645,6 +1645,38 @@ xfs_start_block_reaping(
 	xfs_queue_cowblocks(mp);
 }
 
+/* Scan all incore inodes for block preallocations that we can remove. */
+static inline int
+xfs_blockgc_scan(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	int			error;
+
+	error = xfs_icache_free_eofblocks(mp, eofb);
+	if (error)
+		return error;
+
+	error = xfs_icache_free_cowblocks(mp, eofb);
+	if (error)
+		return error;
+
+	return 0;
+}
+
+/*
+ * Try to free space in the filesystem by purging eofblocks and cowblocks.
+ */
+int
+xfs_blockgc_free_space(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
+
+	return xfs_blockgc_scan(mp, eofb);
+}
+
 /*
  * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
  * quota caused an allocation failure, so we make a best effort by including
@@ -1661,7 +1693,6 @@ xfs_blockgc_free_dquots(
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_mount	*mp = NULL;
 	bool			do_work = false;
-	int			error;
 
 	if (!udqp && !gdqp && !pdqp)
 		return 0;
@@ -1699,11 +1730,7 @@ xfs_blockgc_free_dquots(
 	if (!do_work)
 		return 0;
 
-	error = xfs_icache_free_eofblocks(mp, &eofb);
-	if (error)
-		return error;
-
-	return xfs_icache_free_cowblocks(mp, &eofb);
+	return xfs_blockgc_free_space(mp, &eofb);
 }
 
 /*
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 5f520de637f6..583c132ae0fb 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -57,6 +57,7 @@ void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
 		struct xfs_dquot *pdqp, unsigned int eof_flags);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
+int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_eofblocks *eofb);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4cbf446bae9a..c3fd344aaf5b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3926,6 +3926,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
 		 unsigned long caller_ip), \
 	TP_ARGS(mp, eofb, caller_ip))
 DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+DEFINE_EOFBLOCKS_EVENT(xfs_blockgc_free_space);
 
 #endif /* _TRACE_XFS_H */
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (9 preceding siblings ...)
  2021-01-23 18:52 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
@ 2021-01-23 18:53 ` Darrick J. Wong
  2021-01-24  9:48   ` Christoph Hellwig
  2021-01-26  4:59   ` [PATCH v4.1 " Darrick J. Wong
  10 siblings, 2 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-23 18:53 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (creation, file write, reflink, etc.) is unable to
reserve enough space to handle the modification, try clearing whatever
space the filesystem might have been hanging onto in the hopes of
speeding up the filesystem.  The flushing behavior will become
particularly important when we add deferred inode inactivation because
that will increase the amount of space that isn't actively tied to user
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index e72730f85af1..2b92a4084bb8 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -20,6 +20,8 @@
 #include "xfs_trace.h"
 #include "xfs_error.h"
 #include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
 
 kmem_zone_t	*xfs_trans_zone;
 
@@ -256,8 +258,10 @@ xfs_trans_alloc(
 	struct xfs_trans	**tpp)
 {
 	struct xfs_trans	*tp;
+	unsigned int		tries = 1;
 	int			error;
 
+retry:
 	/*
 	 * Allocate the handle before we do our freeze accounting and setting up
 	 * GFP_NOFS allocation context so that we avoid lockdep false positives
@@ -285,6 +289,22 @@ xfs_trans_alloc(
 	tp->t_firstblock = NULLFSBLOCK;
 
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
+	if (error == -ENOSPC && tries > 0) {
+		xfs_trans_cancel(tp);
+
+		/*
+		 * We weren't able to reserve enough space for the transaction.
+		 * Flush the other speculative space allocations to free space.
+		 * Do not perform a synchronous scan because callers can hold
+		 * other locks.
+		 */
+		error = xfs_blockgc_free_space(mp, NULL);
+		if (error)
+			return error;
+
+		tries--;
+		goto retry;
+	}
 	if (error) {
 		xfs_trans_cancel(tp);
 		return error;


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
@ 2021-01-24  9:34   ` Christoph Hellwig
  2021-01-25 18:15   ` Brian Foster
  2021-01-26  4:52   ` [PATCH v4.1 " Darrick J. Wong
  2 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-24  9:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-23 18:52 ` [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks Darrick J. Wong
@ 2021-01-24  9:39   ` Christoph Hellwig
  2021-01-25 18:16     ` Brian Foster
  2021-01-26  4:53   ` [PATCH v4.1 " Darrick J. Wong
  1 sibling, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-24  9:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

> +	/* We only allow one retry for EDQUOT/ENOSPC. */
> +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> +		*retry = false;
> +		return error;
> +	}

> +	/* Release resources, prepare for scan. */
> +	xfs_trans_cancel(*tpp);
> +	*tpp = NULL;
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	/* Try to free some quota for this file's dquots. */
> +	*retry = true;
> +	xfs_blockgc_free_quota(ip, 0);
> +	return 0;

I till have grave reservations about this calling conventions.  And if
you just remove the unlock and th call to xfs_blockgc_free_quota here
we don't equire a whole lot of boilerplate code in the callers while
making the code possible to reason about for a mere human.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-23 18:52 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
@ 2021-01-24  9:41   ` Christoph Hellwig
  2021-01-25 18:46   ` Brian Foster
  1 sibling, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-24  9:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:55AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In anticipation of more restructuring of the eof/cowblocks gc code,
> refactor calling of those two functions into a single internal helper
> function, then present a new standard interface to purge speculative
> block preallocations and start shifting higher level code to use that.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-23 18:53 ` [PATCH 11/11] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
@ 2021-01-24  9:48   ` Christoph Hellwig
  2021-01-25 18:46     ` Brian Foster
  2021-01-25 20:02     ` Darrick J. Wong
  2021-01-26  4:59   ` [PATCH v4.1 " Darrick J. Wong
  1 sibling, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-24  9:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

> +retry:
>  	/*
>  	 * Allocate the handle before we do our freeze accounting and setting up
>  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> @@ -285,6 +289,22 @@ xfs_trans_alloc(
>  	tp->t_firstblock = NULLFSBLOCK;
>  
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> +	if (error == -ENOSPC && tries > 0) {
> +		xfs_trans_cancel(tp);
> +
> +		/*
> +		 * We weren't able to reserve enough space for the transaction.
> +		 * Flush the other speculative space allocations to free space.
> +		 * Do not perform a synchronous scan because callers can hold
> +		 * other locks.
> +		 */
> +		error = xfs_blockgc_free_space(mp, NULL);
> +		if (error)
> +			return error;
> +
> +		tries--;
> +		goto retry;
> +	}
>  	if (error) {
>  		xfs_trans_cancel(tp);
>  		return error;

Why do we need to restart the whole function?  A failing
xfs_trans_reserve should restore tp to its initial state, and keeping
the SB_FREEZE_FS counter increased also doesn't look harmful as far as
I can tell.  So why not:

	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
	if (error == -ENOSPC) {
		/*
		 * We weren't able to reserve enough space for the transaction.
		 * Flush the other speculative space allocations to free space.
		 * Do not perform a synchronous scan because callers can hold
		 * other locks.
		 */
		error = xfs_blockgc_free_space(mp, NULL);
		if (error)
			return error;
		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
	}
 	if (error) {
  		xfs_trans_cancel(tp);
  		return error;

?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions
  2021-01-23 18:52 ` [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
@ 2021-01-25 18:13   ` Brian Foster
  2021-01-25 19:33     ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:05AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The functions to run an eof/cowblocks scan to try to reduce quota usage
> are kind of a mess -- the logic repeatedly initializes an eofb structure
> and there are logic bugs in the code that result in the cowblocks scan
> never actually happening.
> 
> Replace all three functions with a single function that fills out an
> eofb if we're low on quota and runs both eof and cowblocks scans.
> 

It would be nice to be a bit more explicit about the scanning bug(s)
being fixed here. It looks like a couple potential issues are the first
scan clearing the low free space state on the associated quotas, and
also only falling back to the cowblocks scan if the eofblocks scan
doesn't do anything. If that's the gist of this patch, I'd suggest to
change the patch subject as well since "refactor messy functions"
doesn't really convey that we're fixing some logic issues. Perhaps
something like "xfs: trigger all block scans on low quota space" would
be more accurate?

Otherwise for the code changes:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c   |   15 ++++++---------
>  fs/xfs/xfs_icache.c |   46 ++++++++++++++++------------------------------
>  fs/xfs/xfs_icache.h |    4 ++--
>  3 files changed, 24 insertions(+), 41 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5d4a66c72c78..69879237533b 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -713,7 +713,7 @@ xfs_file_buffered_write(
>  	struct inode		*inode = mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret;
> -	int			enospc = 0;
> +	bool			cleared_space = false;
>  	int			iolock;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT)
> @@ -745,19 +745,16 @@ xfs_file_buffered_write(
>  	 * also behaves as a filter to prevent too many eofblocks scans from
>  	 * running at the same time.
>  	 */
> -	if (ret == -EDQUOT && !enospc) {
> +	if (ret == -EDQUOT && !cleared_space) {
>  		xfs_iunlock(ip, iolock);
> -		enospc = xfs_inode_free_quota_eofblocks(ip);
> -		if (enospc)
> -			goto write_retry;
> -		enospc = xfs_inode_free_quota_cowblocks(ip);
> -		if (enospc)
> +		cleared_space = xfs_inode_free_quota_blocks(ip);
> +		if (cleared_space)
>  			goto write_retry;
>  		iolock = 0;
> -	} else if (ret == -ENOSPC && !enospc) {
> +	} else if (ret == -ENOSPC && !cleared_space) {
>  		struct xfs_eofblocks eofb = {0};
>  
> -		enospc = 1;
> +		cleared_space = true;
>  		xfs_flush_inodes(ip->i_mount);
>  
>  		xfs_iunlock(ip, iolock);
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index deb99300d171..c71eb15e3835 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1397,33 +1397,31 @@ xfs_icache_free_eofblocks(
>  }
>  
>  /*
> - * Run eofblocks scans on the quotas applicable to the inode. For inodes with
> - * multiple quotas, we don't know exactly which quota caused an allocation
> + * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
> + * with multiple quotas, we don't know exactly which quota caused an allocation
>   * failure. We make a best effort by including each quota under low free space
>   * conditions (less than 1% free space) in the scan.
>   */
> -static int
> -__xfs_inode_free_quota_eofblocks(
> -	struct xfs_inode	*ip,
> -	int			(*execute)(struct xfs_mount *mp,
> -					   struct xfs_eofblocks	*eofb))
> +bool
> +xfs_inode_free_quota_blocks(
> +	struct xfs_inode	*ip)
>  {
> -	int scan = 0;
> -	struct xfs_eofblocks eofb = {0};
> -	struct xfs_dquot *dq;
> +	struct xfs_eofblocks	eofb = {0};
> +	struct xfs_dquot	*dq;
> +	bool			do_work = false;
>  
>  	/*
>  	 * Run a sync scan to increase effectiveness and use the union filter to
>  	 * cover all applicable quotas in a single scan.
>  	 */
> -	eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC;
> +	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
>  
>  	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
>  		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
>  		if (dq && xfs_dquot_lowsp(dq)) {
>  			eofb.eof_uid = VFS_I(ip)->i_uid;
>  			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
> -			scan = 1;
> +			do_work = true;
>  		}
>  	}
>  
> @@ -1432,21 +1430,16 @@ __xfs_inode_free_quota_eofblocks(
>  		if (dq && xfs_dquot_lowsp(dq)) {
>  			eofb.eof_gid = VFS_I(ip)->i_gid;
>  			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
> -			scan = 1;
> +			do_work = true;
>  		}
>  	}
>  
> -	if (scan)
> -		execute(ip->i_mount, &eofb);
> +	if (!do_work)
> +		return false;
>  
> -	return scan;
> -}
> -
> -int
> -xfs_inode_free_quota_eofblocks(
> -	struct xfs_inode *ip)
> -{
> -	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
> +	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
> +	return true;
>  }
>  
>  static inline unsigned long
> @@ -1646,13 +1639,6 @@ xfs_icache_free_cowblocks(
>  			XFS_ICI_COWBLOCKS_TAG);
>  }
>  
> -int
> -xfs_inode_free_quota_cowblocks(
> -	struct xfs_inode *ip)
> -{
> -	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_cowblocks);
> -}
> -
>  void
>  xfs_inode_set_cowblocks_tag(
>  	xfs_inode_t	*ip)
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 3a4c8b382cd0..3f7ddbca8638 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -54,17 +54,17 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
>  
>  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
>  
> +bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
> +
>  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
>  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
>  int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
> -int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
>  void xfs_eofblocks_worker(struct work_struct *);
>  void xfs_queue_eofblocks(struct xfs_mount *);
>  
>  void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
>  void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
>  int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
> -int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
>  void xfs_cowblocks_worker(struct work_struct *);
>  void xfs_queue_cowblocks(struct xfs_mount *);
>  
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-23 18:52 ` [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
@ 2021-01-25 18:14   ` Brian Foster
  2021-01-25 19:54     ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Don't stall the cowblocks scan on a locked inode if we possibly can.
> We'd much rather the background scanner keep moving.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index c71eb15e3835..89f9e692fde7 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
>  	void			*args)
>  {
>  	struct xfs_eofblocks	*eofb = args;
> +	bool			wait;
>  	int			ret = 0;
>  
> +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> +
>  	if (!xfs_prep_free_cowblocks(ip))
>  		return 0;
>  
>  	if (!xfs_inode_matches_eofb(ip, eofb))
>  		return 0;
>  
> -	/* Free the CoW blocks */
> -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> +	/*
> +	 * If the caller is waiting, return -EAGAIN to keep the background
> +	 * scanner moving and revisit the inode in a subsequent pass.
> +	 */
> +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> +		if (wait)
> +			return -EAGAIN;
> +		return 0;
> +	}
> +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> +		if (wait)
> +			ret = -EAGAIN;
> +		goto out_iolock;
> +	}

Hmm.. I'd be a little concerned over this allowing a scan to repeat
indefinitely with a competing workload because a restart doesn't carry
over any state from the previous scan. I suppose the
xfs_prep_free_cowblocks() checks make that slightly less likely on a
given file, but I more wonder about a scenario with a large set of
inodes in a particular AG with a sufficient amount of concurrent
activity. All it takes is one trylock failure per scan to have to start
the whole thing over again... hm?

Brian

>  
>  	/*
>  	 * Check again, nobody else should be able to dirty blocks or change
> @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
>  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
>  
>  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> +out_iolock:
>  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  
>  	return ret;
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota
  2021-01-23 18:52 ` [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
@ 2021-01-25 18:14   ` Brian Foster
  0 siblings, 0 replies; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:16AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Buffered writers who have run out of quota reservation call
> xfs_inode_free_quota_blocks to try to free any space reservations that
> might reduce the quota usage.  Unfortunately, the buffered write path
> treats "out of project quota" the same as "out of overall space" so this
> function has never supported scanning for space that might ease an "out
> of project quota" condition.
> 
> We're about to start using this function for cases where we actually
> /can/ tell if we're out of project quota, so add in this functionality.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_icache.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 89f9e692fde7..10c1a0dee17d 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1434,6 +1434,15 @@ xfs_inode_free_quota_blocks(
>  		}
>  	}
>  
> +	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_prid = ip->i_d.di_projid;
> +			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
> +			do_work = true;
> +		}
> +	}
> +
>  	if (!do_work)
>  		return false;
>  
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts
  2021-01-23 18:52 ` [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts Darrick J. Wong
@ 2021-01-25 18:14   ` Brian Foster
  0 siblings, 0 replies; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:21AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Move this function further down in the file so that later cleanups won't
> have to declare static functions.  Change the name because we're about
> to rework all the code that performs garbage collection of speculatively
> allocated file blocks.  No functional changes.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_file.c   |    2 -
>  fs/xfs/xfs_icache.c |  110 ++++++++++++++++++++++++++-------------------------
>  fs/xfs/xfs_icache.h |    2 -
>  3 files changed, 57 insertions(+), 57 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 69879237533b..d69e5abcc1b4 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -747,7 +747,7 @@ xfs_file_buffered_write(
>  	 */
>  	if (ret == -EDQUOT && !cleared_space) {
>  		xfs_iunlock(ip, iolock);
> -		cleared_space = xfs_inode_free_quota_blocks(ip);
> +		cleared_space = xfs_blockgc_free_quota(ip);
>  		if (cleared_space)
>  			goto write_retry;
>  		iolock = 0;
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 10c1a0dee17d..aba901d5637b 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1396,61 +1396,6 @@ xfs_icache_free_eofblocks(
>  			XFS_ICI_EOFBLOCKS_TAG);
>  }
>  
> -/*
> - * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
> - * with multiple quotas, we don't know exactly which quota caused an allocation
> - * failure. We make a best effort by including each quota under low free space
> - * conditions (less than 1% free space) in the scan.
> - */
> -bool
> -xfs_inode_free_quota_blocks(
> -	struct xfs_inode	*ip)
> -{
> -	struct xfs_eofblocks	eofb = {0};
> -	struct xfs_dquot	*dq;
> -	bool			do_work = false;
> -
> -	/*
> -	 * Run a sync scan to increase effectiveness and use the union filter to
> -	 * cover all applicable quotas in a single scan.
> -	 */
> -	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
> -
> -	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
> -		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
> -		if (dq && xfs_dquot_lowsp(dq)) {
> -			eofb.eof_uid = VFS_I(ip)->i_uid;
> -			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
> -			do_work = true;
> -		}
> -	}
> -
> -	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
> -		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
> -		if (dq && xfs_dquot_lowsp(dq)) {
> -			eofb.eof_gid = VFS_I(ip)->i_gid;
> -			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
> -			do_work = true;
> -		}
> -	}
> -
> -	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
> -		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
> -		if (dq && xfs_dquot_lowsp(dq)) {
> -			eofb.eof_prid = ip->i_d.di_projid;
> -			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
> -			do_work = true;
> -		}
> -	}
> -
> -	if (!do_work)
> -		return false;
> -
> -	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> -	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
> -	return true;
> -}
> -
>  static inline unsigned long
>  xfs_iflag_for_tag(
>  	int		tag)
> @@ -1699,3 +1644,58 @@ xfs_start_block_reaping(
>  	xfs_queue_eofblocks(mp);
>  	xfs_queue_cowblocks(mp);
>  }
> +
> +/*
> + * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
> + * with multiple quotas, we don't know exactly which quota caused an allocation
> + * failure. We make a best effort by including each quota under low free space
> + * conditions (less than 1% free space) in the scan.
> + */
> +bool
> +xfs_blockgc_free_quota(
> +	struct xfs_inode	*ip)
> +{
> +	struct xfs_eofblocks	eofb = {0};
> +	struct xfs_dquot	*dq;
> +	bool			do_work = false;
> +
> +	/*
> +	 * Run a sync scan to increase effectiveness and use the union filter to
> +	 * cover all applicable quotas in a single scan.
> +	 */
> +	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
> +
> +	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_uid = VFS_I(ip)->i_uid;
> +			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
> +			do_work = true;
> +		}
> +	}
> +
> +	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_gid = VFS_I(ip)->i_gid;
> +			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
> +			do_work = true;
> +		}
> +	}
> +
> +	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_prid = ip->i_d.di_projid;
> +			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
> +			do_work = true;
> +		}
> +	}
> +
> +	if (!do_work)
> +		return false;
> +
> +	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
> +	return true;
> +}
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 3f7ddbca8638..21b726a05b0d 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -54,7 +54,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
>  
>  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
>  
> -bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
> +bool xfs_blockgc_free_quota(struct xfs_inode *ip);
>  
>  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
>  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
  2021-01-24  9:34   ` Christoph Hellwig
@ 2021-01-25 18:15   ` Brian Foster
  2021-01-26  4:52   ` [PATCH v4.1 " Darrick J. Wong
  2 siblings, 0 replies; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:27AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Change the signature of xfs_blockgc_free_quota in preparation for the
> next few patches.  Callers can now pass EOF_FLAGS into the function to
> control scan parameters; and the function will now pass back any
> corruption errors seen while scanning, though for our retry loops we'll
> just try again unconditionally.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_file.c   |    7 +++----
>  fs/xfs/xfs_icache.c |   20 ++++++++++++--------
>  fs/xfs/xfs_icache.h |    2 +-
>  3 files changed, 16 insertions(+), 13 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index aba901d5637b..68b6f72593dc 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1651,19 +1651,21 @@ xfs_start_block_reaping(
>   * failure. We make a best effort by including each quota under low free space
>   * conditions (less than 1% free space) in the scan.
>   */
> -bool
> +int
>  xfs_blockgc_free_quota(
> -	struct xfs_inode	*ip)
> +	struct xfs_inode	*ip,
> +	unsigned int		eof_flags)
>  {
>  	struct xfs_eofblocks	eofb = {0};
>  	struct xfs_dquot	*dq;
>  	bool			do_work = false;
> +	int			error;
>  
>  	/*
> -	 * Run a sync scan to increase effectiveness and use the union filter to
> +	 * Run a scan to increase effectiveness and use the union filter to

The original comment referred to the increased effectiveness of a sync
scan. It doesn't make a whole lot of sense without that qualification
IMO (even though the scan is still sync). We could move that bit of
comment to the caller where the flag is now set, but it's probably fine
to just remove that text also. With that tweak:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	 * cover all applicable quotas in a single scan.
>  	 */
> -	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
> +	eofb.eof_flags = XFS_EOF_FLAGS_UNION | eof_flags;
>  
>  	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
>  		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
> @@ -1693,9 +1695,11 @@ xfs_blockgc_free_quota(
>  	}
>  
>  	if (!do_work)
> -		return false;
> +		return 0;
>  
> -	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> -	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
> -	return true;
> +	error = xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +	if (error)
> +		return error;
> +
> +	return xfs_icache_free_cowblocks(ip->i_mount, &eofb);
>  }
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 21b726a05b0d..d64ea8f5c589 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -54,7 +54,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
>  
>  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
>  
> -bool xfs_blockgc_free_quota(struct xfs_inode *ip);
> +int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
>  
>  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
>  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-24  9:39   ` Christoph Hellwig
@ 2021-01-25 18:16     ` Brian Foster
  2021-01-25 18:57       ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs, david

On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > +		*retry = false;
> > +		return error;
> > +	}
> 
> > +	/* Release resources, prepare for scan. */
> > +	xfs_trans_cancel(*tpp);
> > +	*tpp = NULL;
> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +
> > +	/* Try to free some quota for this file's dquots. */
> > +	*retry = true;
> > +	xfs_blockgc_free_quota(ip, 0);
> > +	return 0;
> 
> I till have grave reservations about this calling conventions.  And if
> you just remove the unlock and th call to xfs_blockgc_free_quota here
> we don't equire a whole lot of boilerplate code in the callers while
> making the code possible to reason about for a mere human.
> 

I agree that the retry pattern is rather odd. I'm curious, is there a
specific reason this scanning task has to execute outside of transaction
context in the first place? Assuming it does because the underlying work
may involve more transactions or whatnot, I'm wondering if this logic
could be buried further down in the transaction allocation path.

For example, if we passed the quota reservation and inode down into a
new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
the quota reservation as a final step (to avoid adding an extra
unconditional ilock cycle). If quota res fails, iunlock and release the
log res internally and perform the scan. From there, perhaps we could
retry the quota reservation immediately without logres or the ilock by
saving references to the dquots, and then only reacquire logres/ilock on
success..? Just thinking out loud so that might require further
thought...

Brian


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/11] xfs: add a tracepoint for blockgc scans
  2021-01-23 18:52 ` [PATCH 09/11] xfs: add a tracepoint for blockgc scans Darrick J. Wong
@ 2021-01-25 18:45   ` Brian Foster
  2021-01-26  4:56   ` [PATCH v4.1 " Darrick J. Wong
  1 sibling, 0 replies; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:49AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add some tracepoints so that we can observe when the speculative
> preallocation garbage collector runs.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_ioctl.c |    2 ++
>  fs/xfs/xfs_trace.c |    1 +
>  fs/xfs/xfs_trace.h |   39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 952eca338807..da407934364c 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -2356,6 +2356,8 @@ xfs_file_ioctl(
>  		if (error)
>  			return error;
>  
> +		trace_xfs_ioc_free_eofblocks(mp, &keofb, _RET_IP_);
> +
>  		sb_start_write(mp->m_super);
>  		error = xfs_icache_free_eofblocks(mp, &keofb);
>  		sb_end_write(mp->m_super);
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index 120398a37c2a..9b8d703dc9fd 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -29,6 +29,7 @@
>  #include "xfs_filestream.h"
>  #include "xfs_fsmap.h"
>  #include "xfs_btree_staging.h"
> +#include "xfs_icache.h"
>  
>  /*
>   * We include this last to have the helpers above available for the trace
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 407c3a5208ab..4cbf446bae9a 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -37,6 +37,7 @@ struct xfs_trans_res;
>  struct xfs_inobt_rec_incore;
>  union xfs_btree_ptr;
>  struct xfs_dqtrx;
> +struct xfs_eofblocks;
>  
>  #define XFS_ATTR_FILTER_FLAGS \
>  	{ XFS_ATTR_ROOT,	"ROOT" }, \
> @@ -3888,6 +3889,44 @@ DEFINE_EVENT(xfs_timestamp_range_class, name, \
>  DEFINE_TIMESTAMP_RANGE_EVENT(xfs_inode_timestamp_range);
>  DEFINE_TIMESTAMP_RANGE_EVENT(xfs_quota_expiry_range);
>  
> +DECLARE_EVENT_CLASS(xfs_eofblocks_class,
> +	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb,
> +		 unsigned long caller_ip),
> +	TP_ARGS(mp, eofb, caller_ip),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(__u32, flags)
> +		__field(uint32_t, uid)
> +		__field(uint32_t, gid)
> +		__field(prid_t, prid)
> +		__field(__u64, min_file_size)
> +		__field(unsigned long, caller_ip)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->flags = eofb->eof_flags;
> +		__entry->uid = from_kuid(mp->m_super->s_user_ns, eofb->eof_uid);
> +		__entry->gid = from_kgid(mp->m_super->s_user_ns, eofb->eof_gid);
> +		__entry->prid = eofb->eof_prid;
> +		__entry->min_file_size = eofb->eof_min_file_size;
> +		__entry->caller_ip = caller_ip;
> +	),
> +	TP_printk("dev %d:%d flags 0x%x uid %u gid %u prid %u minsize %llu caller %pS",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->flags,
> +		  __entry->uid,
> +		  __entry->gid,
> +		  __entry->prid,
> +		  __entry->min_file_size,
> +		  (char *)__entry->caller_ip)
> +);
> +#define DEFINE_EOFBLOCKS_EVENT(name)	\
> +DEFINE_EVENT(xfs_eofblocks_class, name,	\
> +	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb, \
> +		 unsigned long caller_ip), \
> +	TP_ARGS(mp, eofb, caller_ip))
> +DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-23 18:52 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
  2021-01-24  9:41   ` Christoph Hellwig
@ 2021-01-25 18:46   ` Brian Foster
  2021-01-26  2:33     ` Darrick J. Wong
  1 sibling, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

On Sat, Jan 23, 2021 at 10:52:55AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In anticipation of more restructuring of the eof/cowblocks gc code,
> refactor calling of those two functions into a single internal helper
> function, then present a new standard interface to purge speculative
> block preallocations and start shifting higher level code to use that.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_file.c   |    3 +--
>  fs/xfs/xfs_icache.c |   39 +++++++++++++++++++++++++++++++++------
>  fs/xfs/xfs_icache.h |    1 +
>  fs/xfs/xfs_trace.h  |    1 +
>  4 files changed, 36 insertions(+), 8 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 7f999f9dd80a..0d228a5e879f 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1645,6 +1645,38 @@ xfs_start_block_reaping(
>  	xfs_queue_cowblocks(mp);
>  }
>  
> +/* Scan all incore inodes for block preallocations that we can remove. */
> +static inline int
> +xfs_blockgc_scan(
> +	struct xfs_mount	*mp,
> +	struct xfs_eofblocks	*eofb)
> +{
> +	int			error;
> +
> +	error = xfs_icache_free_eofblocks(mp, eofb);
> +	if (error)
> +		return error;
> +
> +	error = xfs_icache_free_cowblocks(mp, eofb);
> +	if (error)
> +		return error;
> +
> +	return 0;
> +}
> +
> +/*
> + * Try to free space in the filesystem by purging eofblocks and cowblocks.
> + */
> +int
> +xfs_blockgc_free_space(
> +	struct xfs_mount	*mp,
> +	struct xfs_eofblocks	*eofb)
> +{
> +	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
> +
> +	return xfs_blockgc_scan(mp, eofb);
> +}
> +

What's the need for two helpers instead of just
xfs_blockgc_free_space()? Otherwise seems fine.

Brian

>  /*
>   * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
>   * quota caused an allocation failure, so we make a best effort by including
> @@ -1661,7 +1693,6 @@ xfs_blockgc_free_dquots(
>  	struct xfs_eofblocks	eofb = {0};
>  	struct xfs_mount	*mp = NULL;
>  	bool			do_work = false;
> -	int			error;
>  
>  	if (!udqp && !gdqp && !pdqp)
>  		return 0;
> @@ -1699,11 +1730,7 @@ xfs_blockgc_free_dquots(
>  	if (!do_work)
>  		return 0;
>  
> -	error = xfs_icache_free_eofblocks(mp, &eofb);
> -	if (error)
> -		return error;
> -
> -	return xfs_icache_free_cowblocks(mp, &eofb);
> +	return xfs_blockgc_free_space(mp, &eofb);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 5f520de637f6..583c132ae0fb 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -57,6 +57,7 @@ void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
>  int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
>  		struct xfs_dquot *pdqp, unsigned int eof_flags);
>  int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
> +int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_eofblocks *eofb);
>  
>  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
>  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 4cbf446bae9a..c3fd344aaf5b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3926,6 +3926,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
>  		 unsigned long caller_ip), \
>  	TP_ARGS(mp, eofb, caller_ip))
>  DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
> +DEFINE_EOFBLOCKS_EVENT(xfs_blockgc_free_space);
>  
>  #endif /* _TRACE_XFS_H */
>  
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-24  9:48   ` Christoph Hellwig
@ 2021-01-25 18:46     ` Brian Foster
  2021-01-25 20:02     ` Darrick J. Wong
  1 sibling, 0 replies; 52+ messages in thread
From: Brian Foster @ 2021-01-25 18:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs, david

On Sun, Jan 24, 2021 at 09:48:16AM +0000, Christoph Hellwig wrote:
> > +retry:
> >  	/*
> >  	 * Allocate the handle before we do our freeze accounting and setting up
> >  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> > @@ -285,6 +289,22 @@ xfs_trans_alloc(
> >  	tp->t_firstblock = NULLFSBLOCK;
> >  
> >  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > +	if (error == -ENOSPC && tries > 0) {
> > +		xfs_trans_cancel(tp);
> > +
> > +		/*
> > +		 * We weren't able to reserve enough space for the transaction.
> > +		 * Flush the other speculative space allocations to free space.
> > +		 * Do not perform a synchronous scan because callers can hold
> > +		 * other locks.
> > +		 */
> > +		error = xfs_blockgc_free_space(mp, NULL);
> > +		if (error)
> > +			return error;
> > +
> > +		tries--;
> > +		goto retry;
> > +	}
> >  	if (error) {
> >  		xfs_trans_cancel(tp);
> >  		return error;
> 
> Why do we need to restart the whole function?  A failing
> xfs_trans_reserve should restore tp to its initial state, and keeping
> the SB_FREEZE_FS counter increased also doesn't look harmful as far as
> I can tell.  So why not:
> 
> 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> 	if (error == -ENOSPC) {
> 		/*
> 		 * We weren't able to reserve enough space for the transaction.
> 		 * Flush the other speculative space allocations to free space.
> 		 * Do not perform a synchronous scan because callers can hold
> 		 * other locks.
> 		 */
> 		error = xfs_blockgc_free_space(mp, NULL);
> 		if (error)
> 			return error;
> 		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> 	}
>  	if (error) {
>   		xfs_trans_cancel(tp);
>   		return error;
> 
> ?
> 

That looks cleaner to me, but similar to the earlier quota res patch I'm
wondering if this should be pushed down into xfs_trans_reserve() (or
lifted into a new xfs_trans_reserve_blks() helper called from there)
such that it can handle the various scan/retry scenarios in one place.

Brian


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-25 18:16     ` Brian Foster
@ 2021-01-25 18:57       ` Darrick J. Wong
  2021-01-26 13:26         ` Brian Foster
  0 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-25 18:57 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, david

On Mon, Jan 25, 2021 at 01:16:23PM -0500, Brian Foster wrote:
> On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > > +		*retry = false;
> > > +		return error;
> > > +	}
> > 
> > > +	/* Release resources, prepare for scan. */
> > > +	xfs_trans_cancel(*tpp);
> > > +	*tpp = NULL;
> > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +
> > > +	/* Try to free some quota for this file's dquots. */
> > > +	*retry = true;
> > > +	xfs_blockgc_free_quota(ip, 0);
> > > +	return 0;
> > 
> > I till have grave reservations about this calling conventions.  And if
> > you just remove the unlock and th call to xfs_blockgc_free_quota here
> > we don't equire a whole lot of boilerplate code in the callers while
> > making the code possible to reason about for a mere human.
> > 
> 
> I agree that the retry pattern is rather odd. I'm curious, is there a
> specific reason this scanning task has to execute outside of transaction
> context in the first place?

Dave didn't like the open-coded retry and told me to shrink the call
sites to:

	error = xfs_trans_reserve_quota(...);
	if (error)
		goto out_trans_cancel;
	if (quota_retry)
		goto retry;

So here we are, slowly putting things almost all the way back to where
they were originally.  Now I have a little utility function:

/*
 * Cancel a transaction and try to clear some space so that we can
 * reserve some quota.  The caller must hold the ILOCK; when this
 * function returns, the transaction will be cancelled and the ILOCK
 * will have been released.
 */
int
xfs_trans_cancel_qretry(
	struct xfs_trans	*tp,
	struct xfs_inode	*ip)
{
	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));

	xfs_trans_cancel(tp);
	xfs_iunlock(ip, XFS_ILOCK_EXCL);

	return xfs_blockgc_free_quota(ip, 0);
}

Which I guess reduces the amount of call site boilerplate from 4 lines
to two, only now I've spent half of last week on this.

> Assuming it does because the underlying work
> may involve more transactions or whatnot, I'm wondering if this logic
> could be buried further down in the transaction allocation path.
> 
> For example, if we passed the quota reservation and inode down into a
> new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
> the quota reservation as a final step (to avoid adding an extra
> unconditional ilock cycle). If quota res fails, iunlock and release the
> log res internally and perform the scan. From there, perhaps we could
> retry the quota reservation immediately without logres or the ilock by
> saving references to the dquots, and then only reacquire logres/ilock on
> success..? Just thinking out loud so that might require further
> thought...

Yes, that's certainly possible, and probably a good design goal to have
a xfs_trans_alloc_quota(tres, ip, whichfork, nblks, &tp) that one could
call to reserve a transaction, lock the inode, and reserve the
appropriate amounts of quota to handle mapping nblks into an inode fork.

However, there are complications that don't make this a trivial switch:

1. Reflink and (new) swapext don't actually know how many blocks they
need to reserve until after they've grabbed the two ILOCKs, which means
that the wrapper is of no use here.

2. For the remaining quota reservation callsites, you have to deal with
the bmap code that computes qblocks for reservation against the realtime
device.  This is opening a huge can of worms because:

3. Realtime and quota are not supported, which means that none of that
code ever gets properly QA'd.  It would be totally stupid to rework most
of the quota reservation callsites and still leave that logic bomb.
This gigantic piece of technical debt needs to be paid off, either by
fixing the functionality and getting it under test, or by dropping rt
quota support completely and officially.

My guess is that fixing rt quota is probably going to take 10-15
patches, and doing more small cleanups to convert the callsites will be
another 10 or so.

4. We're already past -rc5, and what started as two cleanup patchsets of
13 is now four patchsets of 27 patches, and I /really/ would just like
to get these patches merged without expanding the scope of work even
further.

--D

> Brian
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions
  2021-01-25 18:13   ` Brian Foster
@ 2021-01-25 19:33     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-25 19:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Mon, Jan 25, 2021 at 01:13:31PM -0500, Brian Foster wrote:
> On Sat, Jan 23, 2021 at 10:52:05AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The functions to run an eof/cowblocks scan to try to reduce quota usage
> > are kind of a mess -- the logic repeatedly initializes an eofb structure
> > and there are logic bugs in the code that result in the cowblocks scan
> > never actually happening.
> > 
> > Replace all three functions with a single function that fills out an
> > eofb if we're low on quota and runs both eof and cowblocks scans.
> > 
> 
> It would be nice to be a bit more explicit about the scanning bug(s)
> being fixed here. It looks like a couple potential issues are the first
> scan clearing the low free space state on the associated quotas, and
> also only falling back to the cowblocks scan if the eofblocks scan
> doesn't do anything. If that's the gist of this patch, I'd suggest to
> change the patch subject as well since "refactor messy functions"
> doesn't really convey that we're fixing some logic issues. Perhaps
> something like "xfs: trigger all block scans on low quota space" would
> be more accurate?

Yes, that sounds good.  I'll change the commit message to:

xfs: trigger all block gc scans when low on quota space

The functions to run an eof/cowblocks scan to try to reduce quota usage
are kind of a mess -- the logic repeatedly initializes an eofb structure
and there are logic bugs in the code that result in the cowblocks scan
never actually happening.

Replace all three functions with a single function that fills out an
eofb and runs both eof and cowblocks scans.

--D

> Otherwise for the code changes:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_file.c   |   15 ++++++---------
> >  fs/xfs/xfs_icache.c |   46 ++++++++++++++++------------------------------
> >  fs/xfs/xfs_icache.h |    4 ++--
> >  3 files changed, 24 insertions(+), 41 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 5d4a66c72c78..69879237533b 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -713,7 +713,7 @@ xfs_file_buffered_write(
> >  	struct inode		*inode = mapping->host;
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	ssize_t			ret;
> > -	int			enospc = 0;
> > +	bool			cleared_space = false;
> >  	int			iolock;
> >  
> >  	if (iocb->ki_flags & IOCB_NOWAIT)
> > @@ -745,19 +745,16 @@ xfs_file_buffered_write(
> >  	 * also behaves as a filter to prevent too many eofblocks scans from
> >  	 * running at the same time.
> >  	 */
> > -	if (ret == -EDQUOT && !enospc) {
> > +	if (ret == -EDQUOT && !cleared_space) {
> >  		xfs_iunlock(ip, iolock);
> > -		enospc = xfs_inode_free_quota_eofblocks(ip);
> > -		if (enospc)
> > -			goto write_retry;
> > -		enospc = xfs_inode_free_quota_cowblocks(ip);
> > -		if (enospc)
> > +		cleared_space = xfs_inode_free_quota_blocks(ip);
> > +		if (cleared_space)
> >  			goto write_retry;
> >  		iolock = 0;
> > -	} else if (ret == -ENOSPC && !enospc) {
> > +	} else if (ret == -ENOSPC && !cleared_space) {
> >  		struct xfs_eofblocks eofb = {0};
> >  
> > -		enospc = 1;
> > +		cleared_space = true;
> >  		xfs_flush_inodes(ip->i_mount);
> >  
> >  		xfs_iunlock(ip, iolock);
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index deb99300d171..c71eb15e3835 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1397,33 +1397,31 @@ xfs_icache_free_eofblocks(
> >  }
> >  
> >  /*
> > - * Run eofblocks scans on the quotas applicable to the inode. For inodes with
> > - * multiple quotas, we don't know exactly which quota caused an allocation
> > + * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
> > + * with multiple quotas, we don't know exactly which quota caused an allocation
> >   * failure. We make a best effort by including each quota under low free space
> >   * conditions (less than 1% free space) in the scan.
> >   */
> > -static int
> > -__xfs_inode_free_quota_eofblocks(
> > -	struct xfs_inode	*ip,
> > -	int			(*execute)(struct xfs_mount *mp,
> > -					   struct xfs_eofblocks	*eofb))
> > +bool
> > +xfs_inode_free_quota_blocks(
> > +	struct xfs_inode	*ip)
> >  {
> > -	int scan = 0;
> > -	struct xfs_eofblocks eofb = {0};
> > -	struct xfs_dquot *dq;
> > +	struct xfs_eofblocks	eofb = {0};
> > +	struct xfs_dquot	*dq;
> > +	bool			do_work = false;
> >  
> >  	/*
> >  	 * Run a sync scan to increase effectiveness and use the union filter to
> >  	 * cover all applicable quotas in a single scan.
> >  	 */
> > -	eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC;
> > +	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
> >  
> >  	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
> >  		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
> >  		if (dq && xfs_dquot_lowsp(dq)) {
> >  			eofb.eof_uid = VFS_I(ip)->i_uid;
> >  			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
> > -			scan = 1;
> > +			do_work = true;
> >  		}
> >  	}
> >  
> > @@ -1432,21 +1430,16 @@ __xfs_inode_free_quota_eofblocks(
> >  		if (dq && xfs_dquot_lowsp(dq)) {
> >  			eofb.eof_gid = VFS_I(ip)->i_gid;
> >  			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
> > -			scan = 1;
> > +			do_work = true;
> >  		}
> >  	}
> >  
> > -	if (scan)
> > -		execute(ip->i_mount, &eofb);
> > +	if (!do_work)
> > +		return false;
> >  
> > -	return scan;
> > -}
> > -
> > -int
> > -xfs_inode_free_quota_eofblocks(
> > -	struct xfs_inode *ip)
> > -{
> > -	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
> > +	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> > +	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
> > +	return true;
> >  }
> >  
> >  static inline unsigned long
> > @@ -1646,13 +1639,6 @@ xfs_icache_free_cowblocks(
> >  			XFS_ICI_COWBLOCKS_TAG);
> >  }
> >  
> > -int
> > -xfs_inode_free_quota_cowblocks(
> > -	struct xfs_inode *ip)
> > -{
> > -	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_cowblocks);
> > -}
> > -
> >  void
> >  xfs_inode_set_cowblocks_tag(
> >  	xfs_inode_t	*ip)
> > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > index 3a4c8b382cd0..3f7ddbca8638 100644
> > --- a/fs/xfs/xfs_icache.h
> > +++ b/fs/xfs/xfs_icache.h
> > @@ -54,17 +54,17 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
> >  
> >  void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> >  
> > +bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
> > +
> >  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> >  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> >  int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
> > -int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
> >  void xfs_eofblocks_worker(struct work_struct *);
> >  void xfs_queue_eofblocks(struct xfs_mount *);
> >  
> >  void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
> >  void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
> >  int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
> > -int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
> >  void xfs_cowblocks_worker(struct work_struct *);
> >  void xfs_queue_cowblocks(struct xfs_mount *);
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-25 18:14   ` Brian Foster
@ 2021-01-25 19:54     ` Darrick J. Wong
  2021-01-26 13:14       ` Brian Foster
  0 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-25 19:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Mon, Jan 25, 2021 at 01:14:06PM -0500, Brian Foster wrote:
> On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > We'd much rather the background scanner keep moving.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index c71eb15e3835..89f9e692fde7 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
> >  	void			*args)
> >  {
> >  	struct xfs_eofblocks	*eofb = args;
> > +	bool			wait;
> >  	int			ret = 0;
> >  
> > +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> > +
> >  	if (!xfs_prep_free_cowblocks(ip))
> >  		return 0;
> >  
> >  	if (!xfs_inode_matches_eofb(ip, eofb))
> >  		return 0;
> >  
> > -	/* Free the CoW blocks */
> > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > +	/*
> > +	 * If the caller is waiting, return -EAGAIN to keep the background
> > +	 * scanner moving and revisit the inode in a subsequent pass.
> > +	 */
> > +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > +		if (wait)
> > +			return -EAGAIN;
> > +		return 0;
> > +	}
> > +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> > +		if (wait)
> > +			ret = -EAGAIN;
> > +		goto out_iolock;
> > +	}
> 
> Hmm.. I'd be a little concerned over this allowing a scan to repeat
> indefinitely with a competing workload because a restart doesn't carry
> over any state from the previous scan. I suppose the
> xfs_prep_free_cowblocks() checks make that slightly less likely on a
> given file, but I more wonder about a scenario with a large set of
> inodes in a particular AG with a sufficient amount of concurrent
> activity. All it takes is one trylock failure per scan to have to start
> the whole thing over again... hm?

I'm not quite sure what to do here -- xfs_inode_free_eofblocks already
has the ability to return EAGAIN, which (I think) means that it's
already possible for the low-quota scan to stall indefinitely if the
scan can't lock the inode.

I think we already had a stall limiting factor here in that all the
other threads in the system that hit EDQUOT will drop their IOLOCKs to
scan the fs, which means that while they loop around the scanner they
can only be releasing quota and driving us towards having fewer inodes
with the same dquots and either blockgc tag set.

--D

> Brian
> 
> >  
> >  	/*
> >  	 * Check again, nobody else should be able to dirty blocks or change
> > @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
> >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> >  
> >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > +out_iolock:
> >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> >  
> >  	return ret;
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-24  9:48   ` Christoph Hellwig
  2021-01-25 18:46     ` Brian Foster
@ 2021-01-25 20:02     ` Darrick J. Wong
  2021-01-25 21:06       ` Brian Foster
  1 sibling, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-25 20:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, david

On Sun, Jan 24, 2021 at 09:48:16AM +0000, Christoph Hellwig wrote:
> > +retry:
> >  	/*
> >  	 * Allocate the handle before we do our freeze accounting and setting up
> >  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> > @@ -285,6 +289,22 @@ xfs_trans_alloc(
> >  	tp->t_firstblock = NULLFSBLOCK;
> >  
> >  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > +	if (error == -ENOSPC && tries > 0) {
> > +		xfs_trans_cancel(tp);
> > +
> > +		/*
> > +		 * We weren't able to reserve enough space for the transaction.
> > +		 * Flush the other speculative space allocations to free space.
> > +		 * Do not perform a synchronous scan because callers can hold
> > +		 * other locks.
> > +		 */
> > +		error = xfs_blockgc_free_space(mp, NULL);
> > +		if (error)
> > +			return error;
> > +
> > +		tries--;
> > +		goto retry;
> > +	}
> >  	if (error) {
> >  		xfs_trans_cancel(tp);
> >  		return error;
> 
> Why do we need to restart the whole function?  A failing
> xfs_trans_reserve should restore tp to its initial state, and keeping
> the SB_FREEZE_FS counter increased also doesn't look harmful as far as
> I can tell.  So why not:
> 
> 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> 	if (error == -ENOSPC) {
> 		/*
> 		 * We weren't able to reserve enough space for the transaction.
> 		 * Flush the other speculative space allocations to free space.
> 		 * Do not perform a synchronous scan because callers can hold
> 		 * other locks.
> 		 */
> 		error = xfs_blockgc_free_space(mp, NULL);

xfs_blockgc_free_space runs the blockgc scan directly, which means that
it creates transactions to free blocks.  Since we can't have nested
transactions, we have to drop tp here.

--D

> 		if (error)
> 			return error;
> 		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> 	}
>  	if (error) {
>   		xfs_trans_cancel(tp);
>   		return error;
> 
> ?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-25 20:02     ` Darrick J. Wong
@ 2021-01-25 21:06       ` Brian Foster
  2021-01-26  0:29         ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-25 21:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, david

On Mon, Jan 25, 2021 at 12:02:16PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 24, 2021 at 09:48:16AM +0000, Christoph Hellwig wrote:
> > > +retry:
> > >  	/*
> > >  	 * Allocate the handle before we do our freeze accounting and setting up
> > >  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> > > @@ -285,6 +289,22 @@ xfs_trans_alloc(
> > >  	tp->t_firstblock = NULLFSBLOCK;
> > >  
> > >  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > > +	if (error == -ENOSPC && tries > 0) {
> > > +		xfs_trans_cancel(tp);
> > > +
> > > +		/*
> > > +		 * We weren't able to reserve enough space for the transaction.
> > > +		 * Flush the other speculative space allocations to free space.
> > > +		 * Do not perform a synchronous scan because callers can hold
> > > +		 * other locks.
> > > +		 */
> > > +		error = xfs_blockgc_free_space(mp, NULL);
> > > +		if (error)
> > > +			return error;
> > > +
> > > +		tries--;
> > > +		goto retry;
> > > +	}
> > >  	if (error) {
> > >  		xfs_trans_cancel(tp);
> > >  		return error;
> > 
> > Why do we need to restart the whole function?  A failing
> > xfs_trans_reserve should restore tp to its initial state, and keeping
> > the SB_FREEZE_FS counter increased also doesn't look harmful as far as
> > I can tell.  So why not:
> > 
> > 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > 	if (error == -ENOSPC) {
> > 		/*
> > 		 * We weren't able to reserve enough space for the transaction.
> > 		 * Flush the other speculative space allocations to free space.
> > 		 * Do not perform a synchronous scan because callers can hold
> > 		 * other locks.
> > 		 */
> > 		error = xfs_blockgc_free_space(mp, NULL);
> 
> xfs_blockgc_free_space runs the blockgc scan directly, which means that
> it creates transactions to free blocks.  Since we can't have nested
> transactions, we have to drop tp here.
> 

Technically, I don't think it's a problem to hold a transaction memory
allocation (and superblock write access?) while diving into the scanning
mechanism. BTW, this also looks like a landmine passing a NULL eofb into
the xfs_blockgc_free_space() tracepoint.

Brian

> --D
> 
> > 		if (error)
> > 			return error;
> > 		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > 	}
> >  	if (error) {
> >   		xfs_trans_cancel(tp);
> >   		return error;
> > 
> > ?
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-25 21:06       ` Brian Foster
@ 2021-01-26  0:29         ` Darrick J. Wong
  2021-01-27 16:57           ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  0:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, david

On Mon, Jan 25, 2021 at 04:06:28PM -0500, Brian Foster wrote:
> On Mon, Jan 25, 2021 at 12:02:16PM -0800, Darrick J. Wong wrote:
> > On Sun, Jan 24, 2021 at 09:48:16AM +0000, Christoph Hellwig wrote:
> > > > +retry:
> > > >  	/*
> > > >  	 * Allocate the handle before we do our freeze accounting and setting up
> > > >  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> > > > @@ -285,6 +289,22 @@ xfs_trans_alloc(
> > > >  	tp->t_firstblock = NULLFSBLOCK;
> > > >  
> > > >  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > > > +	if (error == -ENOSPC && tries > 0) {
> > > > +		xfs_trans_cancel(tp);
> > > > +
> > > > +		/*
> > > > +		 * We weren't able to reserve enough space for the transaction.
> > > > +		 * Flush the other speculative space allocations to free space.
> > > > +		 * Do not perform a synchronous scan because callers can hold
> > > > +		 * other locks.
> > > > +		 */
> > > > +		error = xfs_blockgc_free_space(mp, NULL);
> > > > +		if (error)
> > > > +			return error;
> > > > +
> > > > +		tries--;
> > > > +		goto retry;
> > > > +	}
> > > >  	if (error) {
> > > >  		xfs_trans_cancel(tp);
> > > >  		return error;
> > > 
> > > Why do we need to restart the whole function?  A failing
> > > xfs_trans_reserve should restore tp to its initial state, and keeping
> > > the SB_FREEZE_FS counter increased also doesn't look harmful as far as

I'm curious about your motivation for letting transaction nest here.
Seeing as the ENOSPC return should be infrequent, are you simply not
wanting to cycle the memory allocators and the FREEZE_FS counters?

Hm.  I guess at this point the only resources we hold are the FREEZE_FS
counter and *tp itself.  The transaction doesn't have any log space
grants or block reservation associated with it, and I guess we're not in
PF_MEMALLOC_NOFS mode either.  So I guess this is ok, except...

> > > I can tell.  So why not:
> > > 
> > > 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > > 	if (error == -ENOSPC) {
> > > 		/*
> > > 		 * We weren't able to reserve enough space for the transaction.
> > > 		 * Flush the other speculative space allocations to free space.
> > > 		 * Do not perform a synchronous scan because callers can hold
> > > 		 * other locks.
> > > 		 */
> > > 		error = xfs_blockgc_free_space(mp, NULL);
> > 
> > xfs_blockgc_free_space runs the blockgc scan directly, which means that
> > it creates transactions to free blocks.  Since we can't have nested
> > transactions, we have to drop tp here.
> > 
> 
> Technically, I don't think it's a problem to hold a transaction memory
> allocation (and superblock write access?) while diving into the scanning
> mechanism.

...except that doing so will collide with what we've been telling Yafang
(as part of his series to detect nested transactions) as far as when is
the appropriate time to set current->journal_info/PF_MEMALLOC_NOFS.

> BTW, this also looks like a landmine passing a NULL eofb into
> the xfs_blockgc_free_space() tracepoint.

Errk, will fix that.

--D

> Brian
> 
> > --D
> > 
> > > 		if (error)
> > > 			return error;
> > > 		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > > 	}
> > >  	if (error) {
> > >   		xfs_trans_cancel(tp);
> > >   		return error;
> > > 
> > > ?
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-25 18:46   ` Brian Foster
@ 2021-01-26  2:33     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  2:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, hch, david

On Mon, Jan 25, 2021 at 01:46:01PM -0500, Brian Foster wrote:
> On Sat, Jan 23, 2021 at 10:52:55AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > In anticipation of more restructuring of the eof/cowblocks gc code,
> > refactor calling of those two functions into a single internal helper
> > function, then present a new standard interface to purge speculative
> > block preallocations and start shifting higher level code to use that.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_file.c   |    3 +--
> >  fs/xfs/xfs_icache.c |   39 +++++++++++++++++++++++++++++++++------
> >  fs/xfs/xfs_icache.h |    1 +
> >  fs/xfs/xfs_trace.h  |    1 +
> >  4 files changed, 36 insertions(+), 8 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 7f999f9dd80a..0d228a5e879f 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1645,6 +1645,38 @@ xfs_start_block_reaping(
> >  	xfs_queue_cowblocks(mp);
> >  }
> >  
> > +/* Scan all incore inodes for block preallocations that we can remove. */
> > +static inline int
> > +xfs_blockgc_scan(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_eofblocks	*eofb)
> > +{
> > +	int			error;
> > +
> > +	error = xfs_icache_free_eofblocks(mp, eofb);
> > +	if (error)
> > +		return error;
> > +
> > +	error = xfs_icache_free_cowblocks(mp, eofb);
> > +	if (error)
> > +		return error;
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Try to free space in the filesystem by purging eofblocks and cowblocks.
> > + */
> > +int
> > +xfs_blockgc_free_space(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_eofblocks	*eofb)
> > +{
> > +	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
> > +
> > +	return xfs_blockgc_scan(mp, eofb);
> > +}
> > +
> 
> What's the need for two helpers instead of just
> xfs_blockgc_free_space()? Otherwise seems fine.

The whole mess of helpers are combined in interesting ways in the "xfs:
consolidate posteof and cowblocks cleanup" patchset that follows this
one.  The xfs_iwalk_ag loops under xfs_icache_free_{eof,cow}blocks get
hoisted to xfs_blockgc_free_space so we only do the iteration once.

Hm, I guess an additional optimization would be to combine them in the
final product as a patch 10/9.

--D

> Brian
> 
> >  /*
> >   * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
> >   * quota caused an allocation failure, so we make a best effort by including
> > @@ -1661,7 +1693,6 @@ xfs_blockgc_free_dquots(
> >  	struct xfs_eofblocks	eofb = {0};
> >  	struct xfs_mount	*mp = NULL;
> >  	bool			do_work = false;
> > -	int			error;
> >  
> >  	if (!udqp && !gdqp && !pdqp)
> >  		return 0;
> > @@ -1699,11 +1730,7 @@ xfs_blockgc_free_dquots(
> >  	if (!do_work)
> >  		return 0;
> >  
> > -	error = xfs_icache_free_eofblocks(mp, &eofb);
> > -	if (error)
> > -		return error;
> > -
> > -	return xfs_icache_free_cowblocks(mp, &eofb);
> > +	return xfs_blockgc_free_space(mp, &eofb);
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > index 5f520de637f6..583c132ae0fb 100644
> > --- a/fs/xfs/xfs_icache.h
> > +++ b/fs/xfs/xfs_icache.h
> > @@ -57,6 +57,7 @@ void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
> >  int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> >  		struct xfs_dquot *pdqp, unsigned int eof_flags);
> >  int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
> > +int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_eofblocks *eofb);
> >  
> >  void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> >  void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 4cbf446bae9a..c3fd344aaf5b 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3926,6 +3926,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
> >  		 unsigned long caller_ip), \
> >  	TP_ARGS(mp, eofb, caller_ip))
> >  DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
> > +DEFINE_EOFBLOCKS_EVENT(xfs_blockgc_free_space);
> >  
> >  #endif /* _TRACE_XFS_H */
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
  2021-01-24  9:34   ` Christoph Hellwig
  2021-01-25 18:15   ` Brian Foster
@ 2021-01-26  4:52   ` Darrick J. Wong
  2021-01-27 16:59     ` Christoph Hellwig
  2 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:52 UTC (permalink / raw)
  To: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

Change the signature of xfs_blockgc_free_quota in preparation for the
next few patches.  Callers can now pass EOF_FLAGS into the function to
control scan parameters; and the function will now pass back any
corruption errors seen while scanning, though for our retry loops we'll
just try again unconditionally.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
v4.1: fix some of the comments to make more sense
---
 fs/xfs/xfs_file.c   |   10 +++++-----
 fs/xfs/xfs_icache.c |   22 +++++++++++++---------
 fs/xfs/xfs_icache.h |    2 +-
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index d69e5abcc1b4..3be0b1d81325 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -743,14 +743,14 @@ xfs_file_buffered_write(
 	 * metadata space. This reduces the chances that the eofblocks scan
 	 * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
 	 * also behaves as a filter to prevent too many eofblocks scans from
-	 * running at the same time.
+	 * running at the same time.  Use a synchronous scan to increase the
+	 * effectiveness of the scan.
 	 */
 	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		cleared_space = xfs_blockgc_free_quota(ip);
-		if (cleared_space)
-			goto write_retry;
-		iolock = 0;
+		xfs_blockgc_free_quota(ip, XFS_EOF_FLAGS_SYNC);
+		cleared_space = true;
+		goto write_retry;
 	} else if (ret == -ENOSPC && !cleared_space) {
 		struct xfs_eofblocks eofb = {0};
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index aba901d5637b..7323a1a240bd 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1651,19 +1651,21 @@ xfs_start_block_reaping(
  * failure. We make a best effort by including each quota under low free space
  * conditions (less than 1% free space) in the scan.
  */
-bool
+int
 xfs_blockgc_free_quota(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	unsigned int		eof_flags)
 {
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_dquot	*dq;
 	bool			do_work = false;
+	int			error;
 
 	/*
-	 * Run a sync scan to increase effectiveness and use the union filter to
-	 * cover all applicable quotas in a single scan.
+	 * Run a scan to free blocks using the union filter to cover all
+	 * applicable quotas in a single scan.
 	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION | eof_flags;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
 		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
@@ -1693,9 +1695,11 @@ xfs_blockgc_free_quota(
 	}
 
 	if (!do_work)
-		return false;
+		return 0;
 
-	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
-	return true;
+	error = xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	if (error)
+		return error;
+
+	return xfs_icache_free_cowblocks(ip->i_mount, &eofb);
 }
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 21b726a05b0d..d64ea8f5c589 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,7 +54,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
-bool xfs_blockgc_free_quota(struct xfs_inode *ip);
+int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-23 18:52 ` [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks Darrick J. Wong
  2021-01-24  9:39   ` Christoph Hellwig
@ 2021-01-26  4:53   ` Darrick J. Wong
  1 sibling, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:53 UTC (permalink / raw)
  To: linux-xfs, hch, david, Brian Foster

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (data write, reflink, xattr set, fallocate, etc.)
is unable to reserve enough quota to handle the modification, try
clearing whatever space the filesystem might have been hanging onto in
the hopes of speeding up the filesystem.  The flushing behavior will
become particularly important when we add deferred inode inactivation
because that will increase the amount of space that isn't actively tied
to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v4.1: move all the retry code to a helper so that we don't have the
unconventional return conventions
---
 fs/xfs/libxfs/xfs_attr.c |    9 +++++++-
 fs/xfs/libxfs/xfs_bmap.c |    8 ++++++-
 fs/xfs/xfs_bmap_util.c   |   19 +++++++++++++---
 fs/xfs/xfs_iomap.c       |   21 +++++++++++++++---
 fs/xfs/xfs_quota.h       |   22 ++++++++++++++----
 fs/xfs/xfs_reflink.c     |   16 ++++++++++++-
 fs/xfs/xfs_trans_dquot.c |   55 ++++++++++++++++++++++++++++++++++++----------
 7 files changed, 122 insertions(+), 28 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index be51e7068dcd..57054460d07b 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -395,6 +395,7 @@ xfs_attr_set(
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
 	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			quota_retry = false;
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
@@ -458,6 +459,7 @@ xfs_attr_set(
 	 * Root fork attributes can use reserved data blocks for this
 	 * operation if necessary
 	 */
+retry:
 	error = xfs_trans_alloc(mp, &tres, total, 0,
 			rsvd ? XFS_TRANS_RESERVE : 0, &args->trans);
 	if (error)
@@ -479,9 +481,14 @@ xfs_attr_set(
 		if (rsvd)
 			quota_flags |= XFS_QMOPT_FORCE_RES;
 		error = xfs_trans_reserve_quota_nblks(args->trans, dp,
-				args->total, 0, quota_flags);
+				args->total, 0, quota_flags, &quota_retry);
 		if (error)
 			goto out_trans_cancel;
+		if (quota_retry) {
+			xfs_trans_cancel_qretry(args->trans, dp);
+			args->trans = NULL;
+			goto retry;
+		}
 
 		error = xfs_has_attr(args);
 		if (error == -EEXIST && (args->attr_flags & XATTR_CREATE))
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 908b7d49da60..65b53ba4d0b4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1070,6 +1070,7 @@ xfs_bmap_add_attrfork(
 	int			blks;		/* space reservation */
 	int			version = 1;	/* superblock attr version */
 	int			logflags;	/* logging flags */
+	bool			quota_retry = false;
 	int			error;		/* error return value */
 
 	ASSERT(XFS_IFORK_Q(ip) == 0);
@@ -1079,6 +1080,7 @@ xfs_bmap_add_attrfork(
 
 	blks = XFS_ADDAFORK_SPACE_RES(mp);
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_addafork, blks, 0,
 			rsvd ? XFS_TRANS_RESERVE : 0, &tp);
 	if (error)
@@ -1087,9 +1089,13 @@ xfs_bmap_add_attrfork(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_trans_reserve_quota_nblks(tp, ip, blks, 0, rsvd ?
 			XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES :
-			XFS_QMOPT_RES_REGBLKS);
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry(tp, ip);
+		goto retry;
+	}
 	if (XFS_IFORK_Q(ip))
 		goto trans_cancel;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 792809debaaa..9cfb097e632f 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -761,6 +761,7 @@ xfs_alloc_file_space(
 	 */
 	while (allocatesize_fsb && !error) {
 		xfs_fileoff_t	s, e;
+		bool		quota_retry = false;
 
 		/*
 		 * Determine space reservations for data/realtime.
@@ -803,6 +804,7 @@ xfs_alloc_file_space(
 		/*
 		 * Allocate and setup the transaction.
 		 */
+retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
 				resrtextents, 0, &tp);
 
@@ -817,10 +819,14 @@ xfs_alloc_file_space(
 			break;
 		}
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
-						      0, quota_flag);
+		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
+				quota_flag, &quota_retry);
 		if (error)
 			goto error1;
+		if (quota_retry) {
+			xfs_trans_cancel_qretry(tp, ip);
+			goto retry;
+		}
 
 		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
@@ -858,7 +864,6 @@ xfs_alloc_file_space(
 
 error0:	/* unlock inode, unreserve quota blocks, cancel trans */
 	xfs_trans_unreserve_quota_nblks(tp, ip, (long)qblocks, 0, quota_flag);
-
 error1:	/* Just cancel transaction */
 	xfs_trans_cancel(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -875,8 +880,10 @@ xfs_unmap_extent(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	uint			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+	bool			quota_retry = false;
 	int			error;
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	if (error) {
 		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
@@ -885,9 +892,13 @@ xfs_unmap_extent(
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-			XFS_QMOPT_RES_REGBLKS);
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry(tp, ip);
+		goto retry;
+	}
 
 	xfs_trans_ijoin(tp, ip, 0);
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 514e6ae010e0..bb972ad09ccd 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -27,7 +27,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
-
+#include "xfs_icache.h"
 
 #define XFS_ALLOC_ALIGN(mp, off) \
 	(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -197,6 +197,7 @@ xfs_iomap_write_direct(
 	int			quota_flag;
 	uint			qblocks, resblks;
 	unsigned int		resrtextents = 0;
+	bool			quota_retry = false;
 	int			error;
 	int			bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint			tflags = 0;
@@ -239,6 +240,7 @@ xfs_iomap_write_direct(
 			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
 		}
 	}
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, resrtextents,
 			tflags, &tp);
 	if (error)
@@ -246,9 +248,14 @@ xfs_iomap_write_direct(
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
-	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);
+	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag,
+			&quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry(tp, ip);
+		goto retry;
+	}
 
 	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
@@ -544,6 +551,8 @@ xfs_iomap_write_unwritten(
 		return error;
 
 	do {
+		bool	quota_retry = false;
+
 		/*
 		 * Set up a transaction to convert the range of extents
 		 * from unwritten to real. Do allocations in a loop until
@@ -553,6 +562,7 @@ xfs_iomap_write_unwritten(
 		 * here as we might be asked to write out the same inode that we
 		 * complete here and might deadlock on the iolock.
 		 */
+retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
 				XFS_TRANS_RESERVE, &tp);
 		if (error)
@@ -562,9 +572,14 @@ xfs_iomap_write_unwritten(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES);
+				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES,
+				&quota_retry);
 		if (error)
 			goto error_on_bmapi_transaction;
+		if (quota_retry) {
+			xfs_trans_cancel_qretry(tp, ip);
+			goto retry;
+		}
 
 		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 				XFS_IEXT_WRITE_UNWRITTEN_CNT);
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 4cafc1c78879..321c093459a1 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -81,14 +81,16 @@ extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *,
 		uint, int64_t);
 extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *);
 extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *);
-extern int xfs_trans_reserve_quota_nblks(struct xfs_trans *,
-		struct xfs_inode *, int64_t, long, uint);
+int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip,
+		int64_t nblocks, long ninos, unsigned int flags,
+		bool *retry);
 extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
 		struct xfs_mount *, struct xfs_dquot *,
 		struct xfs_dquot *, struct xfs_dquot *, int64_t, long, uint);
 int xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
 		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
 		struct xfs_dquot *pdqp, int64_t nblks);
+int xfs_trans_cancel_qretry(struct xfs_trans *tp, struct xfs_inode *ip);
 
 extern int xfs_qm_vop_dqalloc(struct xfs_inode *, kuid_t, kgid_t,
 		prid_t, uint, struct xfs_dquot **, struct xfs_dquot **,
@@ -115,7 +117,8 @@ static inline int
 xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t nblks, bool isrt)
 {
 	return xfs_trans_reserve_quota_nblks(NULL, ip, nblks, 0,
-			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS);
+			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS,
+			NULL);
 }
 #else
 static inline int
@@ -134,7 +137,8 @@ xfs_qm_vop_dqalloc(struct xfs_inode *ip, kuid_t kuid, kgid_t kgid,
 #define xfs_trans_apply_dquot_deltas(tp)
 #define xfs_trans_unreserve_and_mod_dquots(tp)
 static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp,
-		struct xfs_inode *ip, int64_t nblks, long ninos, uint flags)
+		struct xfs_inode *ip, int64_t nblks, long ninos,
+		unsigned int flags, bool *retry)
 {
 	return 0;
 }
@@ -160,6 +164,13 @@ xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
 	return 0;
 }
 
+static inline int
+xfs_trans_cancel_qretry(struct xfs_trans *tp, struct xfs_inode *ip)
+{
+	ASSERT(0);
+	return 0;
+}
+
 #define xfs_qm_vop_create_dqattach(tp, ip, u, g, p)
 #define xfs_qm_vop_rename_dqattach(it)					(0)
 #define xfs_qm_vop_chown(tp, ip, old, new)				(NULL)
@@ -179,7 +190,8 @@ static inline int
 xfs_trans_unreserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip,
 		int64_t nblks, long ninos, unsigned int flags)
 {
-	return xfs_trans_reserve_quota_nblks(tp, ip, -nblks, -ninos, flags);
+	return xfs_trans_reserve_quota_nblks(tp, ip, -nblks, -ninos, flags,
+			NULL);
 }
 
 static inline int
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0da1a603b7d8..c7712e7ad6ca 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -355,6 +355,7 @@ xfs_reflink_allocate_cow(
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
 	struct xfs_trans	*tp;
 	int			nimaps, error = 0;
+	bool			quota_retry = false;
 	bool			found;
 	xfs_filblks_t		resaligned;
 	xfs_extlen_t		resblks = 0;
@@ -376,6 +377,7 @@ xfs_reflink_allocate_cow(
 	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
 
 	xfs_iunlock(ip, *lockmode);
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	*lockmode = XFS_ILOCK_EXCL;
 	xfs_ilock(ip, *lockmode);
@@ -399,9 +401,13 @@ xfs_reflink_allocate_cow(
 	}
 
 	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-			XFS_QMOPT_RES_REGBLKS);
+			XFS_QMOPT_RES_REGBLKS, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry(tp, ip);
+		goto retry;
+	}
 
 	xfs_trans_ijoin(tp, ip, 0);
 
@@ -1006,10 +1012,12 @@ xfs_reflink_remap_extent(
 	unsigned int		resblks;
 	bool			smap_real;
 	bool			dmap_written = xfs_bmap_is_written_extent(dmap);
+	bool			quota_retry = false;
 	int			iext_delta = 0;
 	int			nimaps;
 	int			error;
 
+retry:
 	/* Start a rolling transaction to switch the mappings */
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
@@ -1095,9 +1103,13 @@ xfs_reflink_remap_extent(
 		qres += dmap->br_blockcount;
 	if (qres > 0) {
 		error = xfs_trans_reserve_quota_nblks(tp, ip, qres, 0,
-				XFS_QMOPT_RES_REGBLKS);
+				XFS_QMOPT_RES_REGBLKS, &quota_retry);
 		if (error)
 			goto out_cancel;
+		if (quota_retry) {
+			xfs_trans_cancel_qretry(tp, ip);
+			goto retry;
+		}
 	}
 
 	if (smap_real)
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 3315498a6fa1..4b679b9f2da7 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -16,6 +16,7 @@
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_trace.h"
+#include "xfs_icache.h"
 
 STATIC void	xfs_trans_alloc_dqinfo(xfs_trans_t *);
 
@@ -770,11 +771,15 @@ xfs_trans_reserve_quota_bydquots(
 	return error;
 }
 
-
 /*
- * Lock the dquot and change the reservation if we can.
- * This doesn't change the actual usage, just the reservation.
- * The inode sent in is locked.
+ * Lock the dquot and change the reservation if we can.  This doesn't change
+ * the actual usage, just the reservation.  The caller must hold ILOCK_EXCL on
+ * the inode.  If @retry is not a NULL pointer, the caller must ensure that
+ * *retry is set to false before the first time this function is called.
+ *
+ * If the quota reservation fails because we hit a quota limit (and retry is
+ * not a NULL pointer, and *retry is false), this function will set *retry to
+ * true and return zero.
  */
 int
 xfs_trans_reserve_quota_nblks(
@@ -782,9 +787,11 @@ xfs_trans_reserve_quota_nblks(
 	struct xfs_inode	*ip,
 	int64_t			nblks,
 	long			ninos,
-	uint			flags)
+	unsigned int		flags,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
 
 	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
 		return 0;
@@ -795,13 +802,37 @@ xfs_trans_reserve_quota_nblks(
 	ASSERT((flags & ~(XFS_QMOPT_FORCE_RES)) == XFS_TRANS_DQ_RES_RTBLKS ||
 	       (flags & ~(XFS_QMOPT_FORCE_RES)) == XFS_TRANS_DQ_RES_BLKS);
 
-	/*
-	 * Reserve nblks against these dquots, with trans as the mediator.
-	 */
-	return xfs_trans_reserve_quota_bydquots(tp, mp,
-						ip->i_udquot, ip->i_gdquot,
-						ip->i_pdquot,
-						nblks, ninos, flags);
+	/* Reserve nblks against these dquots, with trans as the mediator. */
+	error = xfs_trans_reserve_quota_bydquots(tp, mp, ip->i_udquot,
+			ip->i_gdquot, ip->i_pdquot, nblks, ninos, flags);
+	if (retry == NULL)
+		return error;
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	*retry = true;
+	return 0;
+}
+
+/*
+ * Cancel a transaction and try to clear some space so that we can reserve some
+ * quota.  The caller must hold the ILOCK; when this function returns, the
+ * transaction will be cancelled and the ILOCK will have been released.
+ */
+int
+xfs_trans_cancel_qretry(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip)
+{
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return xfs_blockgc_free_quota(ip, 0);
 }
 
 /* Change the quota reservations for an inode creation activity. */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation
  2021-01-23 18:52 ` [PATCH 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation Darrick J. Wong
@ 2021-01-26  4:55   ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:55 UTC (permalink / raw)
  To: linux-xfs, hch, david, Brian Foster

From: Darrick J. Wong <djwong@kernel.org>

If an inode creation is unable to reserve enough quota to handle the
modification, try clearing whatever space the filesystem might have been
hanging onto in the hopes of speeding up the filesystem.  The flushing
behavior will become particularly important when we add deferred inode
inactivation because that will increase the amount of space that isn't
actively tied to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v4.1: move the retry code to a separate helper to fix the weird return
conventions
---
 fs/xfs/xfs_icache.c      |   78 ++++++++++++++++++++++++++++------------------
 fs/xfs/xfs_icache.h      |    2 +
 fs/xfs/xfs_inode.c       |   17 +++++++++-
 fs/xfs/xfs_quota.h       |   23 +++++++++++---
 fs/xfs/xfs_symlink.c     |    9 +++++
 fs/xfs/xfs_trans_dquot.c |   48 +++++++++++++++++++++++++++-
 6 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 7323a1a240bd..ae7888f0e074 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1646,60 +1646,78 @@ xfs_start_block_reaping(
 }
 
 /*
- * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
- * with multiple quotas, we don't know exactly which quota caused an allocation
- * failure. We make a best effort by including each quota under low free space
- * conditions (less than 1% free space) in the scan.
+ * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
+ * quota caused an allocation failure, so we make a best effort by including
+ * each quota under low free space conditions (less than 1% free space) in the
+ * scan.
  */
 int
-xfs_blockgc_free_quota(
-	struct xfs_inode	*ip,
+xfs_blockgc_free_dquots(
+	struct xfs_dquot	*udqp,
+	struct xfs_dquot	*gdqp,
+	struct xfs_dquot	*pdqp,
 	unsigned int		eof_flags)
 {
 	struct xfs_eofblocks	eofb = {0};
-	struct xfs_dquot	*dq;
+	struct xfs_mount	*mp = NULL;
 	bool			do_work = false;
 	int			error;
 
+	if (!udqp && !gdqp && !pdqp)
+		return 0;
+	if (udqp)
+		mp = udqp->q_mount;
+	if (!mp && gdqp)
+		mp = gdqp->q_mount;
+	if (!mp && pdqp)
+		mp = pdqp->q_mount;
+
 	/*
 	 * Run a scan to free blocks using the union filter to cover all
 	 * applicable quotas in a single scan.
 	 */
 	eofb.eof_flags = XFS_EOF_FLAGS_UNION | eof_flags;
 
-	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_uid = VFS_I(ip)->i_uid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
-			do_work = true;
-		}
+	if (XFS_IS_UQUOTA_ENFORCED(mp) && udqp && xfs_dquot_lowsp(udqp)) {
+		eofb.eof_uid = make_kuid(mp->m_super->s_user_ns, udqp->q_id);
+		eofb.eof_flags |= XFS_EOF_FLAGS_UID;
+		do_work = true;
 	}
 
-	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_GROUP);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_gid = VFS_I(ip)->i_gid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
-			do_work = true;
-		}
+	if (XFS_IS_UQUOTA_ENFORCED(mp) && gdqp && xfs_dquot_lowsp(gdqp)) {
+		eofb.eof_gid = make_kgid(mp->m_super->s_user_ns, gdqp->q_id);
+		eofb.eof_flags |= XFS_EOF_FLAGS_GID;
+		do_work = true;
 	}
 
-	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
-		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
-		if (dq && xfs_dquot_lowsp(dq)) {
-			eofb.eof_prid = ip->i_d.di_projid;
-			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
-			do_work = true;
-		}
+	if (XFS_IS_PQUOTA_ENFORCED(mp) && pdqp && xfs_dquot_lowsp(pdqp)) {
+		eofb.eof_prid = pdqp->q_id;
+		eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+		do_work = true;
 	}
 
 	if (!do_work)
 		return 0;
 
-	error = xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	error = xfs_icache_free_eofblocks(mp, &eofb);
 	if (error)
 		return error;
 
-	return xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+	return xfs_icache_free_cowblocks(mp, &eofb);
+}
+
+/*
+ * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
+ * with multiple quotas, we don't know exactly which quota caused an allocation
+ * failure. We make a best effort by including each quota under low free space
+ * conditions (less than 1% free space) in the scan.
+ */
+int
+xfs_blockgc_free_quota(
+	struct xfs_inode	*ip,
+	unsigned int		eof_flags)
+{
+	return xfs_blockgc_free_dquots(xfs_inode_dquot(ip, XFS_DQTYPE_USER),
+			xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
+			xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), eof_flags);
 }
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index d64ea8f5c589..5f520de637f6 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -54,6 +54,8 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
+int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, unsigned int eof_flags);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e909da05cd28..ad4bfe057737 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -990,6 +990,7 @@ xfs_create(
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_trans_res	*tres;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	trace_xfs_create(dp, name);
@@ -1022,6 +1023,7 @@ xfs_create(
 	 * the case we'll drop the one we have and get a more
 	 * appropriate transaction later.
 	 */
+retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
 	if (error == -ENOSPC) {
 		/* flush outstanding delalloc blocks and retry */
@@ -1038,9 +1040,14 @@ xfs_create(
 	 * Reserve disk quota and the inode.
 	 */
 	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+			resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry_dquots(tp, dp, udqp, gdqp, pdqp);
+		unlock_dp_on_error = false;
+		goto retry;
+	}
 
 	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
 			XFS_IEXT_DIR_MANIP_CNT(mp));
@@ -1146,6 +1153,7 @@ xfs_create_tmpfile(
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_trans_res	*tres;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -1165,14 +1173,19 @@ xfs_create_tmpfile(
 	resblks = XFS_IALLOC_SPACE_RES(mp);
 	tres = &M_RES(mp)->tr_create_tmpfile;
 
+retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
 	if (error)
 		goto out_release_inode;
 
 	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+			resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry_dquots(tp, NULL, udqp, gdqp, pdqp);
+		goto retry;
+	}
 
 	error = xfs_dir_ialloc(&tp, dp, mode, 0, 0, prid, &ip);
 	if (error)
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 321c093459a1..c5bbe7e3e259 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -87,10 +87,13 @@ int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip,
 extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
 		struct xfs_mount *, struct xfs_dquot *,
 		struct xfs_dquot *, struct xfs_dquot *, int64_t, long, uint);
-int xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
-		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
-		struct xfs_dquot *pdqp, int64_t nblks);
 int xfs_trans_cancel_qretry(struct xfs_trans *tp, struct xfs_inode *ip);
+int xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
+		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, int64_t nblks, bool *retry);
+int xfs_trans_cancel_qretry_dquots(struct xfs_trans *tp, struct xfs_inode *dp,
+		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp);
 
 extern int xfs_qm_vop_dqalloc(struct xfs_inode *, kuid_t, kgid_t,
 		prid_t, uint, struct xfs_dquot **, struct xfs_dquot **,
@@ -159,7 +162,7 @@ xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t nblks, bool isrt)
 static inline int
 xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_inode *dp,
 		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
-		struct xfs_dquot *pdqp, int64_t nblks)
+		struct xfs_dquot *pdqp, int64_t nblks, bool *retry)
 {
 	return 0;
 }
@@ -171,6 +174,18 @@ xfs_trans_cancel_qretry(struct xfs_trans *tp, struct xfs_inode *ip)
 	return 0;
 }
 
+static inline int
+xfs_trans_cancel_qretry_dquots(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_dquot	*udqp,
+	struct xfs_dquot	*gdqp,
+	struct xfs_dquot	*pdqp)
+{
+	ASSERT(0);
+	return 0;
+}
+
 #define xfs_qm_vop_create_dqattach(tp, ip, u, g, p)
 #define xfs_qm_vop_rename_dqattach(it)					(0)
 #define xfs_qm_vop_chown(tp, ip, old, new)				(NULL)
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index f8bfa51bdeef..c97a3d147bc2 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -159,6 +159,7 @@ xfs_symlink(
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
+	bool			quota_retry = false;
 	uint			resblks;
 
 	*ipp = NULL;
@@ -197,6 +198,7 @@ xfs_symlink(
 		fs_blocks = xfs_symlink_blocks(mp, pathlen);
 	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_symlink, resblks, 0, 0, &tp);
 	if (error)
 		goto out_release_inode;
@@ -216,9 +218,14 @@ xfs_symlink(
 	 * Reserve disk quota : blocks and inode.
 	 */
 	error = xfs_trans_reserve_quota_icreate(tp, dp, udqp, gdqp, pdqp,
-			resblks);
+			resblks, &quota_retry);
 	if (error)
 		goto out_trans_cancel;
+	if (quota_retry) {
+		xfs_trans_cancel_qretry_dquots(tp, dp, udqp, gdqp, pdqp);
+		unlock_dp_on_error = false;
+		goto retry;
+	}
 
 	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
 			XFS_IEXT_DIR_MANIP_CNT(mp));
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 4b679b9f2da7..b463047f04c6 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -835,7 +835,17 @@ xfs_trans_cancel_qretry(
 	return xfs_blockgc_free_quota(ip, 0);
 }
 
-/* Change the quota reservations for an inode creation activity. */
+/*
+ * Change the quota reservations for an inode creation activity.  This doesn't
+ * change the actual usage, just the reservation.  The caller may hold
+ * ILOCK_EXCL on the inode.  If @retry is not a NULL pointer, the caller must
+ * ensure that *retry is set to false before the first time this function is
+ * called.
+ *
+ * If the quota reservation fails because we hit a quota limit (and retry is
+ * not a NULL pointer, and *retry is false), this function will set *retry to
+ * true and return zero.
+ */
 int
 xfs_trans_reserve_quota_icreate(
 	struct xfs_trans	*tp,
@@ -843,17 +853,49 @@ xfs_trans_reserve_quota_icreate(
 	struct xfs_dquot	*udqp,
 	struct xfs_dquot	*gdqp,
 	struct xfs_dquot	*pdqp,
-	int64_t			nblks)
+	int64_t			nblks,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = dp->i_mount;
+	int			error;
 
 	if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
 		return 0;
 
 	ASSERT(!xfs_is_quota_inode(&mp->m_sb, dp->i_ino));
 
-	return xfs_trans_reserve_quota_bydquots(tp, dp->i_mount, udqp, gdqp,
+	error = xfs_trans_reserve_quota_bydquots(tp, dp->i_mount, udqp, gdqp,
 			pdqp, nblks, 1, XFS_QMOPT_RES_REGBLKS);
+	if (retry == NULL)
+		return error;
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	*retry = true;
+	return 0;
+}
+
+/*
+ * Cancel a transaction and try to clear some space so that we can reserve some
+ * quota.  When this function returns, the transaction will be cancelled and dp
+ * (if one is supplied) will be unlocked.
+ */
+int
+xfs_trans_cancel_qretry_dquots(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_dquot	*udqp,
+	struct xfs_dquot	*gdqp,
+	struct xfs_dquot	*pdqp)
+{
+	xfs_trans_cancel(tp);
+	if (dp)
+		xfs_iunlock(dp, XFS_ILOCK_EXCL);
+
+	return xfs_blockgc_free_dquots(udqp, gdqp, pdqp, 0);
 }
 
 /*

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown
  2021-01-23 18:52 ` [PATCH 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown Darrick J. Wong
@ 2021-01-26  4:55   ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:55 UTC (permalink / raw)
  To: linux-xfs, hch, david, Brian Foster

From: Darrick J. Wong <djwong@kernel.org>

If a file user, group, or project change is unable to reserve enough
quota to handle the modification, try clearing whatever space the
filesystem might have been hanging onto in the hopes of speeding up the
filesystem.  The flushing behavior will become particularly important
when we add deferred inode inactivation because that will increase the
amount of space that isn't actively tied to user data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v4.1: fix the unconventional return conventions here too
---
 fs/xfs/xfs_ioctl.c |   13 ++++++++++++-
 fs/xfs/xfs_iops.c  |   14 ++++++++++++--
 fs/xfs/xfs_qm.c    |   23 +++++++++++++++++------
 fs/xfs/xfs_quota.h |    8 ++++----
 4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f61ea5..dab525c2437c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1436,6 +1436,7 @@ xfs_ioctl_setattr(
 	struct xfs_trans	*tp;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
+	bool			quota_retry = false;
 	int			code;
 
 	trace_xfs_ioctl_setattr(ip);
@@ -1462,6 +1463,7 @@ xfs_ioctl_setattr(
 
 	xfs_ioctl_setattr_prepare_dax(ip, fa);
 
+retry:
 	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
@@ -1470,10 +1472,19 @@ xfs_ioctl_setattr(
 
 	if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_PQUOTA_ON(mp) &&
 	    ip->i_d.di_projid != fa->fsx_projid) {
+		unsigned int	flags = 0;
+
+		if (capable(CAP_FOWNER))
+			flags |= XFS_QMOPT_FORCE_RES;
 		code = xfs_qm_vop_chown_reserve(tp, ip, NULL, NULL, pdqp,
-				capable(CAP_FOWNER) ?  XFS_QMOPT_FORCE_RES : 0);
+				flags, &quota_retry);
 		if (code)	/* out of quota */
 			goto error_trans_cancel;
+		if (quota_retry) {
+			xfs_trans_cancel_qretry_dquots(tp, ip, NULL, NULL,
+					pdqp);
+			goto retry;
+		}
 	}
 
 	xfs_fill_fsxattr(ip, false, &old_fa);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f1e21b6cfa48..907952009c2d 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -660,6 +660,7 @@ xfs_setattr_nonsize(
 	kgid_t			gid = GLOBAL_ROOT_GID, igid = GLOBAL_ROOT_GID;
 	struct xfs_dquot	*udqp = NULL, *gdqp = NULL;
 	struct xfs_dquot	*olddquot1 = NULL, *olddquot2 = NULL;
+	bool			quota_retry = false;
 
 	ASSERT((mask & ATTR_SIZE) == 0);
 
@@ -700,6 +701,7 @@ xfs_setattr_nonsize(
 			return error;
 	}
 
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
 	if (error)
 		goto out_dqrele;
@@ -729,12 +731,20 @@ xfs_setattr_nonsize(
 		if (XFS_IS_QUOTA_RUNNING(mp) &&
 		    ((XFS_IS_UQUOTA_ON(mp) && !uid_eq(iuid, uid)) ||
 		     (XFS_IS_GQUOTA_ON(mp) && !gid_eq(igid, gid)))) {
+			unsigned int	flags = 0;
+
+			if (capable(CAP_FOWNER))
+				flags |= XFS_QMOPT_FORCE_RES;
 			ASSERT(tp);
 			error = xfs_qm_vop_chown_reserve(tp, ip, udqp, gdqp,
-						NULL, capable(CAP_FOWNER) ?
-						XFS_QMOPT_FORCE_RES : 0);
+					NULL, flags, &quota_retry);
 			if (error)	/* out of quota */
 				goto out_cancel;
+			if (quota_retry) {
+				xfs_trans_cancel_qretry_dquots(tp, ip, udqp,
+						gdqp, NULL);
+				goto retry;
+			}
 		}
 
 		/*
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index c134eb4aeaa8..4e02609c063d 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1795,7 +1795,8 @@ xfs_qm_vop_chown(
 }
 
 /*
- * Quota reservations for setattr(AT_UID|AT_GID|AT_PROJID).
+ * Quota reservations for setattr(AT_UID|AT_GID|AT_PROJID).  This function has
+ * the same return behavior as xfs_trans_reserve_quota_nblks.
  */
 int
 xfs_qm_vop_chown_reserve(
@@ -1804,15 +1805,16 @@ xfs_qm_vop_chown_reserve(
 	struct xfs_dquot	*udqp,
 	struct xfs_dquot	*gdqp,
 	struct xfs_dquot	*pdqp,
-	uint			flags)
+	unsigned int		flags,
+	bool			*retry)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	uint64_t		delblks;
 	unsigned int		blkflags;
-	struct xfs_dquot	*udq_unres = NULL;
+	struct xfs_dquot	*udq_unres = NULL; /* old dquots */
 	struct xfs_dquot	*gdq_unres = NULL;
 	struct xfs_dquot	*pdq_unres = NULL;
-	struct xfs_dquot	*udq_delblks = NULL;
+	struct xfs_dquot	*udq_delblks = NULL; /* new dquots */
 	struct xfs_dquot	*gdq_delblks = NULL;
 	struct xfs_dquot	*pdq_delblks = NULL;
 	int			error;
@@ -1860,7 +1862,7 @@ xfs_qm_vop_chown_reserve(
 				udq_delblks, gdq_delblks, pdq_delblks,
 				ip->i_d.di_nblocks, 1, flags | blkflags);
 	if (error)
-		return error;
+		goto err;
 
 	/*
 	 * Do the delayed blks reservations/unreservations now. Since, these
@@ -1878,12 +1880,21 @@ xfs_qm_vop_chown_reserve(
 			    udq_delblks, gdq_delblks, pdq_delblks,
 			    (xfs_qcnt_t)delblks, 0, flags | blkflags);
 		if (error)
-			return error;
+			goto err;
 		xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
 				udq_unres, gdq_unres, pdq_unres,
 				-((xfs_qcnt_t)delblks), 0, blkflags);
 	}
 
+	return 0;
+err:
+	/* We only allow one retry for EDQUOT/ENOSPC. */
+	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
+		*retry = false;
+		return error;
+	}
+
+	*retry = true;
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index c5bbe7e3e259..42b79d0829f7 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -103,9 +103,9 @@ extern void xfs_qm_vop_create_dqattach(struct xfs_trans *, struct xfs_inode *,
 extern int xfs_qm_vop_rename_dqattach(struct xfs_inode **);
 extern struct xfs_dquot *xfs_qm_vop_chown(struct xfs_trans *,
 		struct xfs_inode *, struct xfs_dquot **, struct xfs_dquot *);
-extern int xfs_qm_vop_chown_reserve(struct xfs_trans *, struct xfs_inode *,
-		struct xfs_dquot *, struct xfs_dquot *,
-		struct xfs_dquot *, uint);
+int xfs_qm_vop_chown_reserve(struct xfs_trans *tp, struct xfs_inode *ip,
+		struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+		struct xfs_dquot *pdqp, unsigned int flags, bool *retry);
 extern int xfs_qm_dqattach(struct xfs_inode *);
 extern int xfs_qm_dqattach_locked(struct xfs_inode *ip, bool doalloc);
 extern void xfs_qm_dqdetach(struct xfs_inode *);
@@ -189,7 +189,7 @@ xfs_trans_cancel_qretry_dquots(
 #define xfs_qm_vop_create_dqattach(tp, ip, u, g, p)
 #define xfs_qm_vop_rename_dqattach(it)					(0)
 #define xfs_qm_vop_chown(tp, ip, old, new)				(NULL)
-#define xfs_qm_vop_chown_reserve(tp, ip, u, g, p, fl)			(0)
+#define xfs_qm_vop_chown_reserve(tp, ip, u, g, p, fl, retry)		(0)
 #define xfs_qm_dqattach(ip)						(0)
 #define xfs_qm_dqattach_locked(ip, fl)					(0)
 #define xfs_qm_dqdetach(ip)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 09/11] xfs: add a tracepoint for blockgc scans
  2021-01-23 18:52 ` [PATCH 09/11] xfs: add a tracepoint for blockgc scans Darrick J. Wong
  2021-01-25 18:45   ` Brian Foster
@ 2021-01-26  4:56   ` Darrick J. Wong
  1 sibling, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:56 UTC (permalink / raw)
  To: Christoph Hellwig, linux-xfs, hch, david, Brian Foster

From: Darrick J. Wong <djwong@kernel.org>

Add some tracepoints so that we can observe when the speculative
preallocation garbage collector runs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v4.1: fix a potential null deref in the tracepoint
---
 fs/xfs/xfs_ioctl.c |    2 ++
 fs/xfs/xfs_trace.c |    1 +
 fs/xfs/xfs_trace.h |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index dab525c2437c..da3da8677067 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -2359,6 +2359,8 @@ xfs_file_ioctl(
 		if (error)
 			return error;
 
+		trace_xfs_ioc_free_eofblocks(mp, &keofb, _RET_IP_);
+
 		sb_start_write(mp->m_super);
 		error = xfs_icache_free_eofblocks(mp, &keofb);
 		sb_end_write(mp->m_super);
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index 120398a37c2a..9b8d703dc9fd 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -29,6 +29,7 @@
 #include "xfs_filestream.h"
 #include "xfs_fsmap.h"
 #include "xfs_btree_staging.h"
+#include "xfs_icache.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 407c3a5208ab..38649e3341cb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -37,6 +37,7 @@ struct xfs_trans_res;
 struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
+struct xfs_eofblocks;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -3888,6 +3889,46 @@ DEFINE_EVENT(xfs_timestamp_range_class, name, \
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_inode_timestamp_range);
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_quota_expiry_range);
 
+DECLARE_EVENT_CLASS(xfs_eofblocks_class,
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, eofb, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(__u32, flags)
+		__field(uint32_t, uid)
+		__field(uint32_t, gid)
+		__field(prid_t, prid)
+		__field(__u64, min_file_size)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->flags = eofb ? eofb->eof_flags : 0;
+		__entry->uid = eofb ? from_kuid(mp->m_super->s_user_ns,
+						eofb->eof_uid) : 0;
+		__entry->gid = eofb ? from_kgid(mp->m_super->s_user_ns,
+						eofb->eof_gid) : 0;
+		__entry->prid = eofb ? eofb->eof_prid : 0;
+		__entry->min_file_size = eofb ? eofb->eof_min_file_size : 0;
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d flags 0x%x uid %u gid %u prid %u minsize %llu caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->flags,
+		  __entry->uid,
+		  __entry->gid,
+		  __entry->prid,
+		  __entry->min_file_size,
+		  (char *)__entry->caller_ip)
+);
+#define DEFINE_EOFBLOCKS_EVENT(name)	\
+DEFINE_EVENT(xfs_eofblocks_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, eofb, caller_ip))
+DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4.1 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-23 18:53 ` [PATCH 11/11] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
  2021-01-24  9:48   ` Christoph Hellwig
@ 2021-01-26  4:59   ` Darrick J. Wong
  1 sibling, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26  4:59 UTC (permalink / raw)
  To: linux-xfs, hch, david

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (creation, file write, reflink, etc.) is unable to
reserve enough space to handle the modification, try clearing whatever
space the filesystem might have been hanging onto in the hopes of
speeding up the filesystem.  The flushing behavior will become
particularly important when we add deferred inode inactivation because
that will increase the amount of space that isn't actively tied to user
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v4.1: don't free the transaction, since it's not pinning log resources
---
 fs/xfs/xfs_trans.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index e72730f85af1..79ef904c5698 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -20,6 +20,8 @@
 #include "xfs_trace.h"
 #include "xfs_error.h"
 #include "xfs_defer.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
 
 kmem_zone_t	*xfs_trans_zone;
 
@@ -285,6 +287,18 @@ xfs_trans_alloc(
 	tp->t_firstblock = NULLFSBLOCK;
 
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
+	if (error == -ENOSPC) {
+		/*
+		 * We weren't able to reserve enough space for the transaction.
+		 * Flush the other speculative space allocations to free space.
+		 * Do not perform a synchronous scan because callers can hold
+		 * other locks.
+		 */
+		error = xfs_blockgc_free_space(mp, NULL);
+		if (error)
+			return error;
+		error = xfs_trans_reserve(tp, resp, blocks, rtextents);
+	}
 	if (error) {
 		xfs_trans_cancel(tp);
 		return error;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-25 19:54     ` Darrick J. Wong
@ 2021-01-26 13:14       ` Brian Foster
  2021-01-26 18:34         ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-26 13:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Mon, Jan 25, 2021 at 11:54:46AM -0800, Darrick J. Wong wrote:
> On Mon, Jan 25, 2021 at 01:14:06PM -0500, Brian Foster wrote:
> > On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > We'd much rather the background scanner keep moving.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
> > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index c71eb15e3835..89f9e692fde7 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
> > >  	void			*args)
> > >  {
> > >  	struct xfs_eofblocks	*eofb = args;
> > > +	bool			wait;
> > >  	int			ret = 0;
> > >  
> > > +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> > > +
> > >  	if (!xfs_prep_free_cowblocks(ip))
> > >  		return 0;
> > >  
> > >  	if (!xfs_inode_matches_eofb(ip, eofb))
> > >  		return 0;
> > >  
> > > -	/* Free the CoW blocks */
> > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > +	/*
> > > +	 * If the caller is waiting, return -EAGAIN to keep the background
> > > +	 * scanner moving and revisit the inode in a subsequent pass.
> > > +	 */
> > > +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > > +		if (wait)
> > > +			return -EAGAIN;
> > > +		return 0;
> > > +	}
> > > +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> > > +		if (wait)
> > > +			ret = -EAGAIN;
> > > +		goto out_iolock;
> > > +	}
> > 
> > Hmm.. I'd be a little concerned over this allowing a scan to repeat
> > indefinitely with a competing workload because a restart doesn't carry
> > over any state from the previous scan. I suppose the
> > xfs_prep_free_cowblocks() checks make that slightly less likely on a
> > given file, but I more wonder about a scenario with a large set of
> > inodes in a particular AG with a sufficient amount of concurrent
> > activity. All it takes is one trylock failure per scan to have to start
> > the whole thing over again... hm?
> 
> I'm not quite sure what to do here -- xfs_inode_free_eofblocks already
> has the ability to return EAGAIN, which (I think) means that it's
> already possible for the low-quota scan to stall indefinitely if the
> scan can't lock the inode.
> 

Indeed, that is true.

> I think we already had a stall limiting factor here in that all the
> other threads in the system that hit EDQUOT will drop their IOLOCKs to
> scan the fs, which means that while they loop around the scanner they
> can only be releasing quota and driving us towards having fewer inodes
> with the same dquots and either blockgc tag set.
> 

Yeah, that makes sense for the current use case. There's a broader
sequence involved there that provides some throttling and serialization,
along with the fact that the workload is imminently driving into
-ENOSPC.

I think what had me a little concerned upon seeing this is whether the
scanning mechanism is currently suitable for the broader usage
introduced in this series. We've had related issues in the past with
concurrent sync eofblocks scans and iolock (see [1], for example).
Having made it through the rest of the series however, it looks like all
of the new scan invocations are async, so perhaps this is not really an
immediate problem.

I think it would be nice if we could somehow assert that the task that
invokes a sync scan doesn't hold an iolock, but I'm not sure there's a
clean way to do that. We'd probably have to define the interface to
require an inode just for that purpose. It may not be worth that
weirdness, and I suppose if code is tested it should be pretty obvious
that such a scan will never complete..

Brian

[1] c3155097ad89 ("xfs: sync eofblocks scans under iolock are livelock prone")

> --D
> 
> > Brian
> > 
> > >  
> > >  	/*
> > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
> > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > >  
> > >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > +out_iolock:
> > >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > >  
> > >  	return ret;
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-25 18:57       ` Darrick J. Wong
@ 2021-01-26 13:26         ` Brian Foster
  2021-01-26 21:12           ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-26 13:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, david

On Mon, Jan 25, 2021 at 10:57:35AM -0800, Darrick J. Wong wrote:
> On Mon, Jan 25, 2021 at 01:16:23PM -0500, Brian Foster wrote:
> > On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > > > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > > > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > > > +		*retry = false;
> > > > +		return error;
> > > > +	}
> > > 
> > > > +	/* Release resources, prepare for scan. */
> > > > +	xfs_trans_cancel(*tpp);
> > > > +	*tpp = NULL;
> > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > +
> > > > +	/* Try to free some quota for this file's dquots. */
> > > > +	*retry = true;
> > > > +	xfs_blockgc_free_quota(ip, 0);
> > > > +	return 0;
> > > 
> > > I till have grave reservations about this calling conventions.  And if
> > > you just remove the unlock and th call to xfs_blockgc_free_quota here
> > > we don't equire a whole lot of boilerplate code in the callers while
> > > making the code possible to reason about for a mere human.
> > > 
> > 
> > I agree that the retry pattern is rather odd. I'm curious, is there a
> > specific reason this scanning task has to execute outside of transaction
> > context in the first place?
> 
> Dave didn't like the open-coded retry and told me to shrink the call
> sites to:
> 
> 	error = xfs_trans_reserve_quota(...);
> 	if (error)
> 		goto out_trans_cancel;
> 	if (quota_retry)
> 		goto retry;
> 
> So here we are, slowly putting things almost all the way back to where
> they were originally.  Now I have a little utility function:
> 
> /*
>  * Cancel a transaction and try to clear some space so that we can
>  * reserve some quota.  The caller must hold the ILOCK; when this
>  * function returns, the transaction will be cancelled and the ILOCK
>  * will have been released.
>  */
> int
> xfs_trans_cancel_qretry(
> 	struct xfs_trans	*tp,
> 	struct xfs_inode	*ip)
> {
> 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> 
> 	xfs_trans_cancel(tp);
> 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 
> 	return xfs_blockgc_free_quota(ip, 0);
> }
> 
> Which I guess reduces the amount of call site boilerplate from 4 lines
> to two, only now I've spent half of last week on this.
> 
> > Assuming it does because the underlying work
> > may involve more transactions or whatnot, I'm wondering if this logic
> > could be buried further down in the transaction allocation path.
> > 
> > For example, if we passed the quota reservation and inode down into a
> > new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
> > the quota reservation as a final step (to avoid adding an extra
> > unconditional ilock cycle). If quota res fails, iunlock and release the
> > log res internally and perform the scan. From there, perhaps we could
> > retry the quota reservation immediately without logres or the ilock by
> > saving references to the dquots, and then only reacquire logres/ilock on
> > success..? Just thinking out loud so that might require further
> > thought...
> 
> Yes, that's certainly possible, and probably a good design goal to have
> a xfs_trans_alloc_quota(tres, ip, whichfork, nblks, &tp) that one could
> call to reserve a transaction, lock the inode, and reserve the
> appropriate amounts of quota to handle mapping nblks into an inode fork.
> 
> However, there are complications that don't make this a trivial switch:
> 
> 1. Reflink and (new) swapext don't actually know how many blocks they
> need to reserve until after they've grabbed the two ILOCKs, which means
> that the wrapper is of no use here.
> 

IMO, it's preferable to define a clean/usable interface if we can find
one that covers the majority of use cases and have to open code a
handful of outliers than define a cumbersome interface that must be used
everywhere to accommodate the outliers. Perhaps we'll find cleaner ways
to deal with open coded outliers over time..? Perhaps (at least in the
reflink case) we could attempt a worst case quota reservation with the
helper, knowing that it will have invoked the scan on -EDQUOT, and then
fall back to a more accurate open-coded xfs_trans_reserve_() call (that
will no longer fall into retry loops on failure)..?

> 2. For the remaining quota reservation callsites, you have to deal with
> the bmap code that computes qblocks for reservation against the realtime
> device.  This is opening a huge can of worms because:
> 
> 3. Realtime and quota are not supported, which means that none of that
> code ever gets properly QA'd.  It would be totally stupid to rework most
> of the quota reservation callsites and still leave that logic bomb.
> This gigantic piece of technical debt needs to be paid off, either by
> fixing the functionality and getting it under test, or by dropping rt
> quota support completely and officially.
> 

I'm not following what you're referring to here. Can you point to
examples in the code for reference, please?

Brian

> My guess is that fixing rt quota is probably going to take 10-15
> patches, and doing more small cleanups to convert the callsites will be
> another 10 or so.
> 
> 4. We're already past -rc5, and what started as two cleanup patchsets of
> 13 is now four patchsets of 27 patches, and I /really/ would just like
> to get these patches merged without expanding the scope of work even
> further.
> 
> --D
> 
> > Brian
> > 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-26 13:14       ` Brian Foster
@ 2021-01-26 18:34         ` Darrick J. Wong
  2021-01-26 20:03           ` Brian Foster
  0 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26 18:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Tue, Jan 26, 2021 at 08:14:51AM -0500, Brian Foster wrote:
> On Mon, Jan 25, 2021 at 11:54:46AM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 25, 2021 at 01:14:06PM -0500, Brian Foster wrote:
> > > On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > > We'd much rather the background scanner keep moving.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
> > > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > index c71eb15e3835..89f9e692fde7 100644
> > > > --- a/fs/xfs/xfs_icache.c
> > > > +++ b/fs/xfs/xfs_icache.c
> > > > @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
> > > >  	void			*args)
> > > >  {
> > > >  	struct xfs_eofblocks	*eofb = args;
> > > > +	bool			wait;
> > > >  	int			ret = 0;
> > > >  
> > > > +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> > > > +
> > > >  	if (!xfs_prep_free_cowblocks(ip))
> > > >  		return 0;
> > > >  
> > > >  	if (!xfs_inode_matches_eofb(ip, eofb))
> > > >  		return 0;
> > > >  
> > > > -	/* Free the CoW blocks */
> > > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > > +	/*
> > > > +	 * If the caller is waiting, return -EAGAIN to keep the background
> > > > +	 * scanner moving and revisit the inode in a subsequent pass.
> > > > +	 */
> > > > +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > > > +		if (wait)
> > > > +			return -EAGAIN;
> > > > +		return 0;
> > > > +	}
> > > > +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> > > > +		if (wait)
> > > > +			ret = -EAGAIN;
> > > > +		goto out_iolock;
> > > > +	}
> > > 
> > > Hmm.. I'd be a little concerned over this allowing a scan to repeat
> > > indefinitely with a competing workload because a restart doesn't carry
> > > over any state from the previous scan. I suppose the
> > > xfs_prep_free_cowblocks() checks make that slightly less likely on a
> > > given file, but I more wonder about a scenario with a large set of
> > > inodes in a particular AG with a sufficient amount of concurrent
> > > activity. All it takes is one trylock failure per scan to have to start
> > > the whole thing over again... hm?
> > 
> > I'm not quite sure what to do here -- xfs_inode_free_eofblocks already
> > has the ability to return EAGAIN, which (I think) means that it's
> > already possible for the low-quota scan to stall indefinitely if the
> > scan can't lock the inode.
> > 
> 
> Indeed, that is true.
> 
> > I think we already had a stall limiting factor here in that all the
> > other threads in the system that hit EDQUOT will drop their IOLOCKs to
> > scan the fs, which means that while they loop around the scanner they
> > can only be releasing quota and driving us towards having fewer inodes
> > with the same dquots and either blockgc tag set.
> > 
> 
> Yeah, that makes sense for the current use case. There's a broader
> sequence involved there that provides some throttling and serialization,
> along with the fact that the workload is imminently driving into
> -ENOSPC.
> 
> I think what had me a little concerned upon seeing this is whether the
> scanning mechanism is currently suitable for the broader usage
> introduced in this series. We've had related issues in the past with
> concurrent sync eofblocks scans and iolock (see [1], for example).
> Having made it through the rest of the series however, it looks like all
> of the new scan invocations are async, so perhaps this is not really an
> immediate problem.
> 
> I think it would be nice if we could somehow assert that the task that
> invokes a sync scan doesn't hold an iolock, but I'm not sure there's a
> clean way to do that. We'd probably have to define the interface to
> require an inode just for that purpose. It may not be worth that
> weirdness, and I suppose if code is tested it should be pretty obvious
> that such a scan will never complete..

Well... in theory it would be possible to deal with stalls (A->A
livelock or otherwise) if we had that IWALK_NORETRY flag I was talking
about that would cause xfs_iwalk to exit with EAGAIN instead of
restarting the scan at inode 0.  The caller could detect that a
synchronous scan didn't complete, and then decide if it wants to call
back to try again.

But, that might be a lot of extra code to deal with a requirement that
xfs_blockgc_free_* callers cannot hold an iolock or an mmaplock.  Maybe
that's the simpler course of action?

--D

> Brian
> 
> [1] c3155097ad89 ("xfs: sync eofblocks scans under iolock are livelock prone")
> 
> > --D
> > 
> > > Brian
> > > 
> > > >  
> > > >  	/*
> > > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > > @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
> > > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > >  
> > > >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > > +out_iolock:
> > > >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > >  
> > > >  	return ret;
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-26 18:34         ` Darrick J. Wong
@ 2021-01-26 20:03           ` Brian Foster
  2021-01-27  3:09             ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-26 20:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Tue, Jan 26, 2021 at 10:34:52AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 26, 2021 at 08:14:51AM -0500, Brian Foster wrote:
> > On Mon, Jan 25, 2021 at 11:54:46AM -0800, Darrick J. Wong wrote:
> > > On Mon, Jan 25, 2021 at 01:14:06PM -0500, Brian Foster wrote:
> > > > On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > > > We'd much rather the background scanner keep moving.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > ---
> > > > >  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
> > > > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > > index c71eb15e3835..89f9e692fde7 100644
> > > > > --- a/fs/xfs/xfs_icache.c
> > > > > +++ b/fs/xfs/xfs_icache.c
> > > > > @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
> > > > >  	void			*args)
> > > > >  {
> > > > >  	struct xfs_eofblocks	*eofb = args;
> > > > > +	bool			wait;
> > > > >  	int			ret = 0;
> > > > >  
> > > > > +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> > > > > +
> > > > >  	if (!xfs_prep_free_cowblocks(ip))
> > > > >  		return 0;
> > > > >  
> > > > >  	if (!xfs_inode_matches_eofb(ip, eofb))
> > > > >  		return 0;
> > > > >  
> > > > > -	/* Free the CoW blocks */
> > > > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > > > +	/*
> > > > > +	 * If the caller is waiting, return -EAGAIN to keep the background
> > > > > +	 * scanner moving and revisit the inode in a subsequent pass.
> > > > > +	 */
> > > > > +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > > > > +		if (wait)
> > > > > +			return -EAGAIN;
> > > > > +		return 0;
> > > > > +	}
> > > > > +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> > > > > +		if (wait)
> > > > > +			ret = -EAGAIN;
> > > > > +		goto out_iolock;
> > > > > +	}
> > > > 
> > > > Hmm.. I'd be a little concerned over this allowing a scan to repeat
> > > > indefinitely with a competing workload because a restart doesn't carry
> > > > over any state from the previous scan. I suppose the
> > > > xfs_prep_free_cowblocks() checks make that slightly less likely on a
> > > > given file, but I more wonder about a scenario with a large set of
> > > > inodes in a particular AG with a sufficient amount of concurrent
> > > > activity. All it takes is one trylock failure per scan to have to start
> > > > the whole thing over again... hm?
> > > 
> > > I'm not quite sure what to do here -- xfs_inode_free_eofblocks already
> > > has the ability to return EAGAIN, which (I think) means that it's
> > > already possible for the low-quota scan to stall indefinitely if the
> > > scan can't lock the inode.
> > > 
> > 
> > Indeed, that is true.
> > 
> > > I think we already had a stall limiting factor here in that all the
> > > other threads in the system that hit EDQUOT will drop their IOLOCKs to
> > > scan the fs, which means that while they loop around the scanner they
> > > can only be releasing quota and driving us towards having fewer inodes
> > > with the same dquots and either blockgc tag set.
> > > 
> > 
> > Yeah, that makes sense for the current use case. There's a broader
> > sequence involved there that provides some throttling and serialization,
> > along with the fact that the workload is imminently driving into
> > -ENOSPC.
> > 
> > I think what had me a little concerned upon seeing this is whether the
> > scanning mechanism is currently suitable for the broader usage
> > introduced in this series. We've had related issues in the past with
> > concurrent sync eofblocks scans and iolock (see [1], for example).
> > Having made it through the rest of the series however, it looks like all
> > of the new scan invocations are async, so perhaps this is not really an
> > immediate problem.
> > 
> > I think it would be nice if we could somehow assert that the task that
> > invokes a sync scan doesn't hold an iolock, but I'm not sure there's a
> > clean way to do that. We'd probably have to define the interface to
> > require an inode just for that purpose. It may not be worth that
> > weirdness, and I suppose if code is tested it should be pretty obvious
> > that such a scan will never complete..
> 
> Well... in theory it would be possible to deal with stalls (A->A
> livelock or otherwise) if we had that IWALK_NORETRY flag I was talking
> about that would cause xfs_iwalk to exit with EAGAIN instead of
> restarting the scan at inode 0.  The caller could detect that a
> synchronous scan didn't complete, and then decide if it wants to call
> back to try again.
> 
> But, that might be a lot of extra code to deal with a requirement that
> xfs_blockgc_free_* callers cannot hold an iolock or an mmaplock.  Maybe
> that's the simpler course of action?
> 

Yeah, I think we should require that callers drop all such locks before
invoking a sync scan, since that may livelock against the lock held by
the current task (or cause similar weirdness against concurrent sync
scans, as the code prior to the commit below[1] had demonstrated).  The
async scans used throughout this series seem reasonable to me..

Brian

> --D
> 
> > Brian
> > 
> > [1] c3155097ad89 ("xfs: sync eofblocks scans under iolock are livelock prone")
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > >  
> > > > >  	/*
> > > > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > > > @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
> > > > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > > >  
> > > > >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > > > +out_iolock:
> > > > >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > > >  
> > > > >  	return ret;
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-26 13:26         ` Brian Foster
@ 2021-01-26 21:12           ` Darrick J. Wong
  2021-01-27 14:19             ` Brian Foster
  0 siblings, 1 reply; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-26 21:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, david

On Tue, Jan 26, 2021 at 08:26:00AM -0500, Brian Foster wrote:
> On Mon, Jan 25, 2021 at 10:57:35AM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 25, 2021 at 01:16:23PM -0500, Brian Foster wrote:
> > > On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > > > > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > > > > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > > > > +		*retry = false;
> > > > > +		return error;
> > > > > +	}
> > > > 
> > > > > +	/* Release resources, prepare for scan. */
> > > > > +	xfs_trans_cancel(*tpp);
> > > > > +	*tpp = NULL;
> > > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > +
> > > > > +	/* Try to free some quota for this file's dquots. */
> > > > > +	*retry = true;
> > > > > +	xfs_blockgc_free_quota(ip, 0);
> > > > > +	return 0;
> > > > 
> > > > I till have grave reservations about this calling conventions.  And if
> > > > you just remove the unlock and th call to xfs_blockgc_free_quota here
> > > > we don't equire a whole lot of boilerplate code in the callers while
> > > > making the code possible to reason about for a mere human.
> > > > 
> > > 
> > > I agree that the retry pattern is rather odd. I'm curious, is there a
> > > specific reason this scanning task has to execute outside of transaction
> > > context in the first place?
> > 
> > Dave didn't like the open-coded retry and told me to shrink the call
> > sites to:
> > 
> > 	error = xfs_trans_reserve_quota(...);
> > 	if (error)
> > 		goto out_trans_cancel;
> > 	if (quota_retry)
> > 		goto retry;
> > 
> > So here we are, slowly putting things almost all the way back to where
> > they were originally.  Now I have a little utility function:
> > 
> > /*
> >  * Cancel a transaction and try to clear some space so that we can
> >  * reserve some quota.  The caller must hold the ILOCK; when this
> >  * function returns, the transaction will be cancelled and the ILOCK
> >  * will have been released.
> >  */
> > int
> > xfs_trans_cancel_qretry(
> > 	struct xfs_trans	*tp,
> > 	struct xfs_inode	*ip)
> > {
> > 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > 
> > 	xfs_trans_cancel(tp);
> > 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > 
> > 	return xfs_blockgc_free_quota(ip, 0);
> > }
> > 
> > Which I guess reduces the amount of call site boilerplate from 4 lines
> > to two, only now I've spent half of last week on this.
> > 
> > > Assuming it does because the underlying work
> > > may involve more transactions or whatnot, I'm wondering if this logic
> > > could be buried further down in the transaction allocation path.
> > > 
> > > For example, if we passed the quota reservation and inode down into a
> > > new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
> > > the quota reservation as a final step (to avoid adding an extra
> > > unconditional ilock cycle). If quota res fails, iunlock and release the
> > > log res internally and perform the scan. From there, perhaps we could
> > > retry the quota reservation immediately without logres or the ilock by
> > > saving references to the dquots, and then only reacquire logres/ilock on
> > > success..? Just thinking out loud so that might require further
> > > thought...
> > 
> > Yes, that's certainly possible, and probably a good design goal to have
> > a xfs_trans_alloc_quota(tres, ip, whichfork, nblks, &tp) that one could
> > call to reserve a transaction, lock the inode, and reserve the
> > appropriate amounts of quota to handle mapping nblks into an inode fork.
> > 
> > However, there are complications that don't make this a trivial switch:
> > 
> > 1. Reflink and (new) swapext don't actually know how many blocks they
> > need to reserve until after they've grabbed the two ILOCKs, which means
> > that the wrapper is of no use here.
> > 
> 
> IMO, it's preferable to define a clean/usable interface if we can find
> one that covers the majority of use cases and have to open code a
> handful of outliers than define a cumbersome interface that must be used
> everywhere to accommodate the outliers. Perhaps we'll find cleaner ways
> to deal with open coded outliers over time..?

Sure, we might, but let's not delay this cleanup, since these are the
last two pieces that I need to get merged before I can send out deferred
inode inactivation for review.  Deferred inode inactivation adds yet
another button that we can push to reclaim free space when something hits
EDQUOT/ENOSPC.

FWIW I did start down the path of building a better interface last week,
but quickly became mired in (1) how do we allocate rt quota with a new
interface and (2) do we care?  And then I started looking at what rt
allocations do wrt quota and decided that fixing that (or even removing
it) would be an entire patchset.

Hence I'm trying to constrain this patchset to updating the existing
callsites to do the scan+retry, and no more.

> Perhaps (at least in the
> reflink case) we could attempt a worst case quota reservation with the
> helper, knowing that it will have invoked the scan on -EDQUOT, and then
> fall back to a more accurate open-coded xfs_trans_reserve_() call (that
> will no longer fall into retry loops on failure)..?

Making a worst case reservation and backing off creates more ways for
things to fail unnecessarily.

For a remap operation, the worst case is if the source file range has an
allocated mapping and the destination file range is a hole, because we
have to increment quota by the size of that allocated mapping.  If we
run out of quota we'll have to flush the fs and try again.  If we fail
the quota reservation a second time, the whole operation fails.

This is not good behavior for all the other cases -- if both mappings
are holes or both allocated, we just failed an operation that would have
made no net change to the quota allocations.  If the source file range
is a hole and the destination range is allocated, we actually would have
/decreased/ the quota usage, but instead we fail with EDQUOT.

Right now the remap code handles those cases just fine, at a cost of
open coded logic.

> > 2. For the remaining quota reservation callsites, you have to deal with
> > the bmap code that computes qblocks for reservation against the realtime
> > device.  This is opening a huge can of worms because:
> > 
> > 3. Realtime and quota are not supported, which means that none of that
> > code ever gets properly QA'd.  It would be totally stupid to rework most
> > of the quota reservation callsites and still leave that logic bomb.
> > This gigantic piece of technical debt needs to be paid off, either by
> > fixing the functionality and getting it under test, or by dropping rt
> > quota support completely and officially.
> > 
> 
> I'm not following what you're referring to here. Can you point to
> examples in the code for reference, please?

If you format a filesystem with realtime and mount it with -oquota, xfs
will ignore the 'quota' mount option completely.  Some code exists to
do rt quota accounting (xfs_alloc_file_space and xfs_iomap_write_direct
are examples) but since we never allow rt+quota, the code coverage on
that is 0%.

I've also noticed that those functions seem to have this odd behavior
where for rt files, they'll reserve quota for the allocated blocks
themselves but not the bmbt blocks; but for regular files, they reserve
quota for both the allocated blocks and the bmbt blocks.  The quota code
makes various allowances for transactions that try to commit quota count
updates but have zero quota reservation attached to the transaction,
which I /think/ could have been an attempt to work around that quirk.

I also just noticed that xfs_bmapi_reserve_delalloc only works with
non-rt files.  I guess that's fine since rt files don't use the delalloc
mechanism anyway (and I think the  reason they don't is that xfs can't
currently do write-around to handle rextsize>1 filesystems) but that's
another mess to sort.

(FWIW I'm slowly working through all those rt issues as part of maturing
the rt reflink patchset, but that's at the far end of my dev tree...)

--D

> 
> Brian
> 
> > My guess is that fixing rt quota is probably going to take 10-15
> > patches, and doing more small cleanups to convert the callsites will be
> > another 10 or so.
> > 
> > 4. We're already past -rc5, and what started as two cleanup patchsets of
> > 13 is now four patchsets of 27 patches, and I /really/ would just like
> > to get these patches merged without expanding the scope of work even
> > further.
> > 
> > --D
> > 
> > > Brian
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-26 20:03           ` Brian Foster
@ 2021-01-27  3:09             ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-27  3:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, hch, david

On Tue, Jan 26, 2021 at 03:03:09PM -0500, Brian Foster wrote:
> On Tue, Jan 26, 2021 at 10:34:52AM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 26, 2021 at 08:14:51AM -0500, Brian Foster wrote:
> > > On Mon, Jan 25, 2021 at 11:54:46AM -0800, Darrick J. Wong wrote:
> > > > On Mon, Jan 25, 2021 at 01:14:06PM -0500, Brian Foster wrote:
> > > > > On Sat, Jan 23, 2021 at 10:52:10AM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > > 
> > > > > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > > > > We'd much rather the background scanner keep moving.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > > ---
> > > > > >  fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
> > > > > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > 
> > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > > > index c71eb15e3835..89f9e692fde7 100644
> > > > > > --- a/fs/xfs/xfs_icache.c
> > > > > > +++ b/fs/xfs/xfs_icache.c
> > > > > > @@ -1605,17 +1605,31 @@ xfs_inode_free_cowblocks(
> > > > > >  	void			*args)
> > > > > >  {
> > > > > >  	struct xfs_eofblocks	*eofb = args;
> > > > > > +	bool			wait;
> > > > > >  	int			ret = 0;
> > > > > >  
> > > > > > +	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
> > > > > > +
> > > > > >  	if (!xfs_prep_free_cowblocks(ip))
> > > > > >  		return 0;
> > > > > >  
> > > > > >  	if (!xfs_inode_matches_eofb(ip, eofb))
> > > > > >  		return 0;
> > > > > >  
> > > > > > -	/* Free the CoW blocks */
> > > > > > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > > > > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > > > > +	/*
> > > > > > +	 * If the caller is waiting, return -EAGAIN to keep the background
> > > > > > +	 * scanner moving and revisit the inode in a subsequent pass.
> > > > > > +	 */
> > > > > > +	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > > > > > +		if (wait)
> > > > > > +			return -EAGAIN;
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
> > > > > > +		if (wait)
> > > > > > +			ret = -EAGAIN;
> > > > > > +		goto out_iolock;
> > > > > > +	}
> > > > > 
> > > > > Hmm.. I'd be a little concerned over this allowing a scan to repeat
> > > > > indefinitely with a competing workload because a restart doesn't carry
> > > > > over any state from the previous scan. I suppose the
> > > > > xfs_prep_free_cowblocks() checks make that slightly less likely on a
> > > > > given file, but I more wonder about a scenario with a large set of
> > > > > inodes in a particular AG with a sufficient amount of concurrent
> > > > > activity. All it takes is one trylock failure per scan to have to start
> > > > > the whole thing over again... hm?
> > > > 
> > > > I'm not quite sure what to do here -- xfs_inode_free_eofblocks already
> > > > has the ability to return EAGAIN, which (I think) means that it's
> > > > already possible for the low-quota scan to stall indefinitely if the
> > > > scan can't lock the inode.
> > > > 
> > > 
> > > Indeed, that is true.
> > > 
> > > > I think we already had a stall limiting factor here in that all the
> > > > other threads in the system that hit EDQUOT will drop their IOLOCKs to
> > > > scan the fs, which means that while they loop around the scanner they
> > > > can only be releasing quota and driving us towards having fewer inodes
> > > > with the same dquots and either blockgc tag set.
> > > > 
> > > 
> > > Yeah, that makes sense for the current use case. There's a broader
> > > sequence involved there that provides some throttling and serialization,
> > > along with the fact that the workload is imminently driving into
> > > -ENOSPC.
> > > 
> > > I think what had me a little concerned upon seeing this is whether the
> > > scanning mechanism is currently suitable for the broader usage
> > > introduced in this series. We've had related issues in the past with
> > > concurrent sync eofblocks scans and iolock (see [1], for example).
> > > Having made it through the rest of the series however, it looks like all
> > > of the new scan invocations are async, so perhaps this is not really an
> > > immediate problem.
> > > 
> > > I think it would be nice if we could somehow assert that the task that
> > > invokes a sync scan doesn't hold an iolock, but I'm not sure there's a
> > > clean way to do that. We'd probably have to define the interface to
> > > require an inode just for that purpose. It may not be worth that
> > > weirdness, and I suppose if code is tested it should be pretty obvious
> > > that such a scan will never complete..
> > 
> > Well... in theory it would be possible to deal with stalls (A->A
> > livelock or otherwise) if we had that IWALK_NORETRY flag I was talking
> > about that would cause xfs_iwalk to exit with EAGAIN instead of
> > restarting the scan at inode 0.  The caller could detect that a
> > synchronous scan didn't complete, and then decide if it wants to call
> > back to try again.
> > 
> > But, that might be a lot of extra code to deal with a requirement that
> > xfs_blockgc_free_* callers cannot hold an iolock or an mmaplock.  Maybe
> > that's the simpler course of action?
> > 
> 
> Yeah, I think we should require that callers drop all such locks before
> invoking a sync scan, since that may livelock against the lock held by
> the current task (or cause similar weirdness against concurrent sync
> scans, as the code prior to the commit below[1] had demonstrated).  The
> async scans used throughout this series seem reasonable to me..

Ok, will update the code comment for xfs_blockgc_free_quota to say that
callers cannot hold any inode IO/MMAP/ILOCKs for sync scans.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > [1] c3155097ad89 ("xfs: sync eofblocks scans under iolock are livelock prone")
> > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > > >  
> > > > > >  	/*
> > > > > >  	 * Check again, nobody else should be able to dirty blocks or change
> > > > > > @@ -1625,6 +1639,7 @@ xfs_inode_free_cowblocks(
> > > > > >  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > > > >  
> > > > > >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > > > > +out_iolock:
> > > > > >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > > > >  
> > > > > >  	return ret;
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-26 21:12           ` Darrick J. Wong
@ 2021-01-27 14:19             ` Brian Foster
  2021-01-27 17:19               ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Brian Foster @ 2021-01-27 14:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, david

On Tue, Jan 26, 2021 at 01:12:59PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 26, 2021 at 08:26:00AM -0500, Brian Foster wrote:
> > On Mon, Jan 25, 2021 at 10:57:35AM -0800, Darrick J. Wong wrote:
> > > On Mon, Jan 25, 2021 at 01:16:23PM -0500, Brian Foster wrote:
> > > > On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > > > > > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > > > > > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > > > > > +		*retry = false;
> > > > > > +		return error;
> > > > > > +	}
> > > > > 
> > > > > > +	/* Release resources, prepare for scan. */
> > > > > > +	xfs_trans_cancel(*tpp);
> > > > > > +	*tpp = NULL;
> > > > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > > +
> > > > > > +	/* Try to free some quota for this file's dquots. */
> > > > > > +	*retry = true;
> > > > > > +	xfs_blockgc_free_quota(ip, 0);
> > > > > > +	return 0;
> > > > > 
> > > > > I till have grave reservations about this calling conventions.  And if
> > > > > you just remove the unlock and th call to xfs_blockgc_free_quota here
> > > > > we don't equire a whole lot of boilerplate code in the callers while
> > > > > making the code possible to reason about for a mere human.
> > > > > 
> > > > 
> > > > I agree that the retry pattern is rather odd. I'm curious, is there a
> > > > specific reason this scanning task has to execute outside of transaction
> > > > context in the first place?
> > > 
> > > Dave didn't like the open-coded retry and told me to shrink the call
> > > sites to:
> > > 
> > > 	error = xfs_trans_reserve_quota(...);
> > > 	if (error)
> > > 		goto out_trans_cancel;
> > > 	if (quota_retry)
> > > 		goto retry;
> > > 
> > > So here we are, slowly putting things almost all the way back to where
> > > they were originally.  Now I have a little utility function:
> > > 
> > > /*
> > >  * Cancel a transaction and try to clear some space so that we can
> > >  * reserve some quota.  The caller must hold the ILOCK; when this
> > >  * function returns, the transaction will be cancelled and the ILOCK
> > >  * will have been released.
> > >  */
> > > int
> > > xfs_trans_cancel_qretry(
> > > 	struct xfs_trans	*tp,
> > > 	struct xfs_inode	*ip)
> > > {
> > > 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > > 
> > > 	xfs_trans_cancel(tp);
> > > 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > 
> > > 	return xfs_blockgc_free_quota(ip, 0);
> > > }
> > > 
> > > Which I guess reduces the amount of call site boilerplate from 4 lines
> > > to two, only now I've spent half of last week on this.
> > > 
> > > > Assuming it does because the underlying work
> > > > may involve more transactions or whatnot, I'm wondering if this logic
> > > > could be buried further down in the transaction allocation path.
> > > > 
> > > > For example, if we passed the quota reservation and inode down into a
> > > > new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
> > > > the quota reservation as a final step (to avoid adding an extra
> > > > unconditional ilock cycle). If quota res fails, iunlock and release the
> > > > log res internally and perform the scan. From there, perhaps we could
> > > > retry the quota reservation immediately without logres or the ilock by
> > > > saving references to the dquots, and then only reacquire logres/ilock on
> > > > success..? Just thinking out loud so that might require further
> > > > thought...
> > > 
> > > Yes, that's certainly possible, and probably a good design goal to have
> > > a xfs_trans_alloc_quota(tres, ip, whichfork, nblks, &tp) that one could
> > > call to reserve a transaction, lock the inode, and reserve the
> > > appropriate amounts of quota to handle mapping nblks into an inode fork.
> > > 
> > > However, there are complications that don't make this a trivial switch:
> > > 
> > > 1. Reflink and (new) swapext don't actually know how many blocks they
> > > need to reserve until after they've grabbed the two ILOCKs, which means
> > > that the wrapper is of no use here.
> > > 
> > 
> > IMO, it's preferable to define a clean/usable interface if we can find
> > one that covers the majority of use cases and have to open code a
> > handful of outliers than define a cumbersome interface that must be used
> > everywhere to accommodate the outliers. Perhaps we'll find cleaner ways
> > to deal with open coded outliers over time..?
> 
> Sure, we might, but let's not delay this cleanup, since these are the
> last two pieces that I need to get merged before I can send out deferred
> inode inactivation for review.  Deferred inode inactivation adds yet
> another button that we can push to reclaim free space when something hits
> EDQUOT/ENOSPC.
> 

Not sure I see the need to rush in a particular interface that multiple
reviewers have expressed reservations about just because there are more
patches coming built on top. That just creates more churn and cleanup
work for later, which means more review/test cycles and more work
indirectly for people who might have to deal with backports, etc. I'm
not dead set against what this patch does if there's no better
alternative, but IMO it's better to get it right than get it fast so we
should at least give fair consideration to some alternatives if ideas
are being presented.

> FWIW I did start down the path of building a better interface last week,
> but quickly became mired in (1) how do we allocate rt quota with a new
> interface and (2) do we care?  And then I started looking at what rt
> allocations do wrt quota and decided that fixing that (or even removing
> it) would be an entire patchset.
> 
> Hence I'm trying to constrain this patchset to updating the existing
> callsites to do the scan+retry, and no more.
> 

Ok, well I think that helps me understand the situation, but I'm still
not really following if/how that conflicts with any of the previous
suggestions (which is why I was asking for example code to consider).

> > Perhaps (at least in the
> > reflink case) we could attempt a worst case quota reservation with the
> > helper, knowing that it will have invoked the scan on -EDQUOT, and then
> > fall back to a more accurate open-coded xfs_trans_reserve_() call (that
> > will no longer fall into retry loops on failure)..?
> 
> Making a worst case reservation and backing off creates more ways for
> things to fail unnecessarily.
> 
> For a remap operation, the worst case is if the source file range has an
> allocated mapping and the destination file range is a hole, because we
> have to increment quota by the size of that allocated mapping.  If we
> run out of quota we'll have to flush the fs and try again.  If we fail
> the quota reservation a second time, the whole operation fails.
> 

Right...

> This is not good behavior for all the other cases -- if both mappings
> are holes or both allocated, we just failed an operation that would have
> made no net change to the quota allocations.  If the source file range
> is a hole and the destination range is allocated, we actually would have
> /decreased/ the quota usage, but instead we fail with EDQUOT.
> 

But that wasn't the suggestion. The suggestion was to do something along
the lines of the following in the reflink case:

	error = xfs_trans_alloc_quota(..., ip, resblks, worstqres, ...);
	if (error == -EDQUOT) {
		worstqres = 0;
		error = xfs_trans_alloc(..., resblks, ...);
		...
	}

	if (!worstqres) {
		worstqres = <calculate actual quota res>
		error = xfs_trans_reserve_quota(...);
		if (error)
			return error;
	}

	...

... where the initial transaction allocation failure would have failed
on the worst case qres, but also would have run the internal reclaim
scan and retried before it returned. Therefore, we could still attempt
the open coded non-worst case reservation and either proceed or return
-EDQUOT with generally similar scan->retry semantics as this patch, just
without the open coded goto loops everywhere we attach quota reservation
to a transaction. This of course assumes that the
xfs_trans_alloc_quota() interface works well enough for the majority of
other cases without need for open coded reservation...

> Right now the remap code handles those cases just fine, at a cost of
> open coded logic.
> 
> > > 2. For the remaining quota reservation callsites, you have to deal with
> > > the bmap code that computes qblocks for reservation against the realtime
> > > device.  This is opening a huge can of worms because:
> > > 
> > > 3. Realtime and quota are not supported, which means that none of that
> > > code ever gets properly QA'd.  It would be totally stupid to rework most
> > > of the quota reservation callsites and still leave that logic bomb.
> > > This gigantic piece of technical debt needs to be paid off, either by
> > > fixing the functionality and getting it under test, or by dropping rt
> > > quota support completely and officially.
> > > 
> > 
> > I'm not following what you're referring to here. Can you point to
> > examples in the code for reference, please?
> 
> If you format a filesystem with realtime and mount it with -oquota, xfs
> will ignore the 'quota' mount option completely.  Some code exists to
> do rt quota accounting (xfs_alloc_file_space and xfs_iomap_write_direct
> are examples) but since we never allow rt+quota, the code coverage on
> that is 0%.
> 

Ok, but how is that any different for what this patch does?

Brian

> I've also noticed that those functions seem to have this odd behavior
> where for rt files, they'll reserve quota for the allocated blocks
> themselves but not the bmbt blocks; but for regular files, they reserve
> quota for both the allocated blocks and the bmbt blocks.  The quota code
> makes various allowances for transactions that try to commit quota count
> updates but have zero quota reservation attached to the transaction,
> which I /think/ could have been an attempt to work around that quirk.
> 
> I also just noticed that xfs_bmapi_reserve_delalloc only works with
> non-rt files.  I guess that's fine since rt files don't use the delalloc
> mechanism anyway (and I think the  reason they don't is that xfs can't
> currently do write-around to handle rextsize>1 filesystems) but that's
> another mess to sort.
> 
> (FWIW I'm slowly working through all those rt issues as part of maturing
> the rt reflink patchset, but that's at the far end of my dev tree...)
> 
> --D
> 
> > 
> > Brian
> > 
> > > My guess is that fixing rt quota is probably going to take 10-15
> > > patches, and doing more small cleanups to convert the callsites will be
> > > another 10 or so.
> > > 
> > > 4. We're already past -rc5, and what started as two cleanup patchsets of
> > > 13 is now four patchsets of 27 patches, and I /really/ would just like
> > > to get these patches merged without expanding the scope of work even
> > > further.
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-26  0:29         ` Darrick J. Wong
@ 2021-01-27 16:57           ` Christoph Hellwig
  2021-01-27 21:00             ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-27 16:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs, david

On Mon, Jan 25, 2021 at 04:29:01PM -0800, Darrick J. Wong wrote:
> ...except that doing so will collide with what we've been telling Yafang
> (as part of his series to detect nested transactions) as far as when is
> the appropriate time to set current->journal_info/PF_MEMALLOC_NOFS.

Can't we do that based on a log/blk reservation?  If not I'm also fine
with going back to your original goto based loop, it just looked rather
cumbersome to me.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4.1 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-26  4:52   ` [PATCH v4.1 " Darrick J. Wong
@ 2021-01-27 16:59     ` Christoph Hellwig
  2021-01-27 17:11       ` Darrick J. Wong
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2021-01-27 16:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

I'm a little lost what these v4.1 patches right in the middle of a
deep thread area..

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4.1 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota
  2021-01-27 16:59     ` Christoph Hellwig
@ 2021-01-27 17:11       ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-27 17:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, david

On Wed, Jan 27, 2021 at 04:59:59PM +0000, Christoph Hellwig wrote:
> I'm a little lost what these v4.1 patches right in the middle of a
> deep thread area..

I've lost my ability to track where these discussions have gone since
everything's landing out of order and I'm just now discovering replies
from other people that predate my last email to them.

It might simply be time for me to repost the whole thing, with explicit
cc for everyone who's participated in these threads in case vger slows
again.

--D

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks
  2021-01-27 14:19             ` Brian Foster
@ 2021-01-27 17:19               ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-27 17:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, david

On Wed, Jan 27, 2021 at 09:19:10AM -0500, Brian Foster wrote:
> On Tue, Jan 26, 2021 at 01:12:59PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 26, 2021 at 08:26:00AM -0500, Brian Foster wrote:
> > > On Mon, Jan 25, 2021 at 10:57:35AM -0800, Darrick J. Wong wrote:
> > > > On Mon, Jan 25, 2021 at 01:16:23PM -0500, Brian Foster wrote:
> > > > > On Sun, Jan 24, 2021 at 09:39:53AM +0000, Christoph Hellwig wrote:
> > > > > > > +	/* We only allow one retry for EDQUOT/ENOSPC. */
> > > > > > > +	if (*retry || (error != -EDQUOT && error != -ENOSPC)) {
> > > > > > > +		*retry = false;
> > > > > > > +		return error;
> > > > > > > +	}
> > > > > > 
> > > > > > > +	/* Release resources, prepare for scan. */
> > > > > > > +	xfs_trans_cancel(*tpp);
> > > > > > > +	*tpp = NULL;
> > > > > > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > > > +
> > > > > > > +	/* Try to free some quota for this file's dquots. */
> > > > > > > +	*retry = true;
> > > > > > > +	xfs_blockgc_free_quota(ip, 0);
> > > > > > > +	return 0;
> > > > > > 
> > > > > > I till have grave reservations about this calling conventions.  And if
> > > > > > you just remove the unlock and th call to xfs_blockgc_free_quota here
> > > > > > we don't equire a whole lot of boilerplate code in the callers while
> > > > > > making the code possible to reason about for a mere human.
> > > > > > 
> > > > > 
> > > > > I agree that the retry pattern is rather odd. I'm curious, is there a
> > > > > specific reason this scanning task has to execute outside of transaction
> > > > > context in the first place?
> > > > 
> > > > Dave didn't like the open-coded retry and told me to shrink the call
> > > > sites to:
> > > > 
> > > > 	error = xfs_trans_reserve_quota(...);
> > > > 	if (error)
> > > > 		goto out_trans_cancel;
> > > > 	if (quota_retry)
> > > > 		goto retry;
> > > > 
> > > > So here we are, slowly putting things almost all the way back to where
> > > > they were originally.  Now I have a little utility function:
> > > > 
> > > > /*
> > > >  * Cancel a transaction and try to clear some space so that we can
> > > >  * reserve some quota.  The caller must hold the ILOCK; when this
> > > >  * function returns, the transaction will be cancelled and the ILOCK
> > > >  * will have been released.
> > > >  */
> > > > int
> > > > xfs_trans_cancel_qretry(
> > > > 	struct xfs_trans	*tp,
> > > > 	struct xfs_inode	*ip)
> > > > {
> > > > 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > > > 
> > > > 	xfs_trans_cancel(tp);
> > > > 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > 
> > > > 	return xfs_blockgc_free_quota(ip, 0);
> > > > }
> > > > 
> > > > Which I guess reduces the amount of call site boilerplate from 4 lines
> > > > to two, only now I've spent half of last week on this.
> > > > 
> > > > > Assuming it does because the underlying work
> > > > > may involve more transactions or whatnot, I'm wondering if this logic
> > > > > could be buried further down in the transaction allocation path.
> > > > > 
> > > > > For example, if we passed the quota reservation and inode down into a
> > > > > new variant of xfs_trans_alloc(), it could acquire the ilock and attempt
> > > > > the quota reservation as a final step (to avoid adding an extra
> > > > > unconditional ilock cycle). If quota res fails, iunlock and release the
> > > > > log res internally and perform the scan. From there, perhaps we could
> > > > > retry the quota reservation immediately without logres or the ilock by
> > > > > saving references to the dquots, and then only reacquire logres/ilock on
> > > > > success..? Just thinking out loud so that might require further
> > > > > thought...
> > > > 
> > > > Yes, that's certainly possible, and probably a good design goal to have
> > > > a xfs_trans_alloc_quota(tres, ip, whichfork, nblks, &tp) that one could
> > > > call to reserve a transaction, lock the inode, and reserve the
> > > > appropriate amounts of quota to handle mapping nblks into an inode fork.
> > > > 
> > > > However, there are complications that don't make this a trivial switch:
> > > > 
> > > > 1. Reflink and (new) swapext don't actually know how many blocks they
> > > > need to reserve until after they've grabbed the two ILOCKs, which means
> > > > that the wrapper is of no use here.
> > > > 
> > > 
> > > IMO, it's preferable to define a clean/usable interface if we can find
> > > one that covers the majority of use cases and have to open code a
> > > handful of outliers than define a cumbersome interface that must be used
> > > everywhere to accommodate the outliers. Perhaps we'll find cleaner ways
> > > to deal with open coded outliers over time..?
> > 
> > Sure, we might, but let's not delay this cleanup, since these are the
> > last two pieces that I need to get merged before I can send out deferred
> > inode inactivation for review.  Deferred inode inactivation adds yet
> > another button that we can push to reclaim free space when something hits
> > EDQUOT/ENOSPC.
> > 
> 
> Not sure I see the need to rush in a particular interface that multiple
> reviewers have expressed reservations about just because there are more
> patches coming built on top. That just creates more churn and cleanup
> work for later, which means more review/test cycles and more work
> indirectly for people who might have to deal with backports, etc. I'm
> not dead set against what this patch does if there's no better
> alternative, but IMO it's better to get it right than get it fast so we
> should at least give fair consideration to some alternatives if ideas
> are being presented.
> 
> > FWIW I did start down the path of building a better interface last week,
> > but quickly became mired in (1) how do we allocate rt quota with a new
> > interface and (2) do we care?  And then I started looking at what rt
> > allocations do wrt quota and decided that fixing that (or even removing
> > it) would be an entire patchset.
> > 
> > Hence I'm trying to constrain this patchset to updating the existing
> > callsites to do the scan+retry, and no more.
> > 
> 
> Ok, well I think that helps me understand the situation, but I'm still
> not really following if/how that conflicts with any of the previous
> suggestions (which is why I was asking for example code to consider).
> 
> > > Perhaps (at least in the
> > > reflink case) we could attempt a worst case quota reservation with the
> > > helper, knowing that it will have invoked the scan on -EDQUOT, and then
> > > fall back to a more accurate open-coded xfs_trans_reserve_() call (that
> > > will no longer fall into retry loops on failure)..?
> > 
> > Making a worst case reservation and backing off creates more ways for
> > things to fail unnecessarily.
> > 
> > For a remap operation, the worst case is if the source file range has an
> > allocated mapping and the destination file range is a hole, because we
> > have to increment quota by the size of that allocated mapping.  If we
> > run out of quota we'll have to flush the fs and try again.  If we fail
> > the quota reservation a second time, the whole operation fails.
> > 
> 
> Right...
> 
> > This is not good behavior for all the other cases -- if both mappings
> > are holes or both allocated, we just failed an operation that would have
> > made no net change to the quota allocations.  If the source file range
> > is a hole and the destination range is allocated, we actually would have
> > /decreased/ the quota usage, but instead we fail with EDQUOT.
> > 
> 
> But that wasn't the suggestion. The suggestion was to do something along
> the lines of the following in the reflink case:
> 
> 	error = xfs_trans_alloc_quota(..., ip, resblks, worstqres, ...);
> 	if (error == -EDQUOT) {
> 		worstqres = 0;
> 		error = xfs_trans_alloc(..., resblks, ...);
> 		...
> 	}

OH.  I misread that sentence with "fall back to a more accurate reserve
call", and totally thought that your suggestion was to use
xfs_trans_alloc_quota on its own, then later when we know how much quota
we really want, using xfs_trans_reserve_quota to adjust the transaction.

I am totally ok with doing it this way.

> 	if (!worstqres) {
> 		worstqres = <calculate actual quota res>
> 		error = xfs_trans_reserve_quota(...);
> 		if (error)
> 			return error;
> 	}
> 
> 	...
> 
> ... where the initial transaction allocation failure would have failed
> on the worst case qres, but also would have run the internal reclaim
> scan and retried before it returned. Therefore, we could still attempt
> the open coded non-worst case reservation and either proceed or return
> -EDQUOT with generally similar scan->retry semantics as this patch, just
> without the open coded goto loops everywhere we attach quota reservation
> to a transaction. This of course assumes that the
> xfs_trans_alloc_quota() interface works well enough for the majority of
> other cases without need for open coded reservation...

I think it will.

> > Right now the remap code handles those cases just fine, at a cost of
> > open coded logic.
> > 
> > > > 2. For the remaining quota reservation callsites, you have to deal with
> > > > the bmap code that computes qblocks for reservation against the realtime
> > > > device.  This is opening a huge can of worms because:
> > > > 
> > > > 3. Realtime and quota are not supported, which means that none of that
> > > > code ever gets properly QA'd.  It would be totally stupid to rework most
> > > > of the quota reservation callsites and still leave that logic bomb.
> > > > This gigantic piece of technical debt needs to be paid off, either by
> > > > fixing the functionality and getting it under test, or by dropping rt
> > > > quota support completely and officially.
> > > > 
> > > 
> > > I'm not following what you're referring to here. Can you point to
> > > examples in the code for reference, please?
> > 
> > If you format a filesystem with realtime and mount it with -oquota, xfs
> > will ignore the 'quota' mount option completely.  Some code exists to
> > do rt quota accounting (xfs_alloc_file_space and xfs_iomap_write_direct
> > are examples) but since we never allow rt+quota, the code coverage on
> > that is 0%.
> > 
> 
> Ok, but how is that any different for what this patch does?

In the end there isn't any practical difference; I had to get over my
reluctance to fiddle around with code that can't ever be run.  Whatever
the state of rt quota, at least users can't get to it.

With that... between the long delivery delays and replies arriving out
of order and with unpredictable lag time, it might just be time for me
to tidy up my patchsets and send a v5.

--D

> Brian
> 
> > I've also noticed that those functions seem to have this odd behavior
> > where for rt files, they'll reserve quota for the allocated blocks
> > themselves but not the bmbt blocks; but for regular files, they reserve
> > quota for both the allocated blocks and the bmbt blocks.  The quota code
> > makes various allowances for transactions that try to commit quota count
> > updates but have zero quota reservation attached to the transaction,
> > which I /think/ could have been an attempt to work around that quirk.
> > 
> > I also just noticed that xfs_bmapi_reserve_delalloc only works with
> > non-rt files.  I guess that's fine since rt files don't use the delalloc
> > mechanism anyway (and I think the  reason they don't is that xfs can't
> > currently do write-around to handle rextsize>1 filesystems) but that's
> > another mess to sort.
> > 
> > (FWIW I'm slowly working through all those rt issues as part of maturing
> > the rt reflink patchset, but that's at the far end of my dev tree...)
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > My guess is that fixing rt quota is probably going to take 10-15
> > > > patches, and doing more small cleanups to convert the callsites will be
> > > > another 10 or so.
> > > > 
> > > > 4. We're already past -rc5, and what started as two cleanup patchsets of
> > > > 13 is now four patchsets of 27 patches, and I /really/ would just like
> > > > to get these patches merged without expanding the scope of work even
> > > > further.
> > > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/11] xfs: flush speculative space allocations when we run out of space
  2021-01-27 16:57           ` Christoph Hellwig
@ 2021-01-27 21:00             ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-27 21:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, linux-xfs, david

On Wed, Jan 27, 2021 at 04:57:34PM +0000, Christoph Hellwig wrote:
> On Mon, Jan 25, 2021 at 04:29:01PM -0800, Darrick J. Wong wrote:
> > ...except that doing so will collide with what we've been telling Yafang
> > (as part of his series to detect nested transactions) as far as when is
> > the appropriate time to set current->journal_info/PF_MEMALLOC_NOFS.
> 
> Can't we do that based on a log/blk reservation?  If not I'm also fine
> with going back to your original goto based loop, it just looked rather
> cumbersome to me.

Meh, we'll figure that out when that series gets closer to merging...

--D

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-28  6:02 [PATCHSET v5 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
@ 2021-01-28  6:03 ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-28  6:03 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, linux-xfs, hch, david, bfoster

From: Darrick J. Wong <djwong@kernel.org>

In anticipation of more restructuring of the eof/cowblocks gc code,
refactor calling of those two functions into a single internal helper
function, then present a new standard interface to purge speculative
block preallocations and start shifting higher level code to use that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c   |    3 +--
 fs/xfs/xfs_icache.c |   39 +++++++++++++++++++++++++++++++++------
 fs/xfs/xfs_icache.h |    1 +
 fs/xfs/xfs_trace.h  |    1 +
 4 files changed, 36 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 3be0b1d81325..dc91973c0b4f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -759,8 +759,7 @@ xfs_file_buffered_write(
 
 		xfs_iunlock(ip, iolock);
 		eofb.eof_flags = XFS_EOF_FLAGS_SYNC;
-		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-		xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+		xfs_blockgc_free_space(ip->i_mount, &eofb);
 		goto write_retry;
 	}
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index cd369dd48818..97c15fcdd6f7 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1645,6 +1645,38 @@ xfs_start_block_reaping(
 	xfs_queue_cowblocks(mp);
 }
 
+/* Scan all incore inodes for block preallocations that we can remove. */
+static inline int
+xfs_blockgc_scan(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	int			error;
+
+	error = xfs_icache_free_eofblocks(mp, eofb);
+	if (error)
+		return error;
+
+	error = xfs_icache_free_cowblocks(mp, eofb);
+	if (error)
+		return error;
+
+	return 0;
+}
+
+/*
+ * Try to free space in the filesystem by purging eofblocks and cowblocks.
+ */
+int
+xfs_blockgc_free_space(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
+
+	return xfs_blockgc_scan(mp, eofb);
+}
+
 /*
  * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
  * quota caused an allocation failure, so we make a best effort by including
@@ -1665,7 +1697,6 @@ xfs_blockgc_free_dquots(
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_mount	*mp = NULL;
 	bool			do_work = false;
-	int			error;
 
 	if (!udqp && !gdqp && !pdqp)
 		return 0;
@@ -1703,11 +1734,7 @@ xfs_blockgc_free_dquots(
 	if (!do_work)
 		return 0;
 
-	error = xfs_icache_free_eofblocks(mp, &eofb);
-	if (error)
-		return error;
-
-	return xfs_icache_free_cowblocks(mp, &eofb);
+	return xfs_blockgc_free_space(mp, &eofb);
 }
 
 /* Run cow/eofblocks scans on the quotas attached to the inode. */
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 5f520de637f6..583c132ae0fb 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -57,6 +57,7 @@ void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
 		struct xfs_dquot *pdqp, unsigned int eof_flags);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags);
+int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_eofblocks *eofb);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 38649e3341cb..27929c6ca43a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3928,6 +3928,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
 		 unsigned long caller_ip), \
 	TP_ARGS(mp, eofb, caller_ip))
 DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+DEFINE_EOFBLOCKS_EVENT(xfs_blockgc_free_space);
 
 #endif /* _TRACE_XFS_H */
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites
  2021-01-18 22:11 [PATCHSET v3 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
@ 2021-01-18 22:12 ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2021-01-18 22:12 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In anticipation of more restructuring of the eof/cowblocks gc code,
refactor calling of those two functions into a single internal helper
function, then present a new standard interface to purge speculative
block preallocations and start shifting higher level code to use that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c   |    6 ++++--
 fs/xfs/xfs_icache.c |   45 +++++++++++++++++++++++++++++++++++----------
 fs/xfs/xfs_icache.h |    1 +
 fs/xfs/xfs_trace.h  |    1 +
 4 files changed, 41 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a318a4749b59..40b12db17a20 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -697,14 +697,16 @@ xfs_file_buffered_aio_write(
 		iolock = 0;
 	} else if (ret == -ENOSPC && !cleared_space) {
 		struct xfs_eofblocks eofb = {0};
+		int	ret2;
 
 		cleared_space = true;
 		xfs_flush_inodes(ip->i_mount);
 
 		xfs_iunlock(ip, iolock);
 		eofb.eof_flags = XFS_EOF_FLAGS_SYNC;
-		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-		xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+		ret2 = xfs_blockgc_free_space(ip->i_mount, &eofb);
+		if (ret2)
+			return ret2;
 		goto write_retry;
 	}
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9ba1ad69abb7..acf67384e52f 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1645,6 +1645,38 @@ xfs_start_block_reaping(
 	xfs_queue_cowblocks(mp);
 }
 
+/* Scan all incore inodes for block preallocations that we can remove. */
+static inline int
+xfs_blockgc_scan(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	int			error;
+
+	error = xfs_icache_free_eofblocks(mp, eofb);
+	if (error)
+		return error;
+
+	error = xfs_icache_free_cowblocks(mp, eofb);
+	if (error)
+		return error;
+
+	return 0;
+}
+
+/*
+ * Try to free space in the filesystem by purging eofblocks and cowblocks.
+ */
+int
+xfs_blockgc_free_space(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	trace_xfs_blockgc_free_space(mp, eofb, _RET_IP_);
+
+	return xfs_blockgc_scan(mp, eofb);
+}
+
 /*
  * Run cow/eofblocks scans on the supplied dquots.  We don't know exactly which
  * quota caused an allocation failure, so we make a best effort by including
@@ -1661,7 +1693,6 @@ xfs_blockgc_free_dquots(
 {
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_mount	*mp = NULL;
-	int			error;
 
 	*found_work = false;
 
@@ -1698,18 +1729,12 @@ xfs_blockgc_free_dquots(
 		*found_work = true;
 	}
 
-	if (*found_work) {
-		error = xfs_icache_free_eofblocks(mp, &eofb);
-		if (error)
-			return error;
-
-		error = xfs_icache_free_cowblocks(mp, &eofb);
-		if (error)
-			return error;
-	}
+	if (*found_work)
+		return xfs_blockgc_free_space(mp, &eofb);
 
 	return 0;
 }
+
 /*
  * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
  * with multiple quotas, we don't know exactly which quota caused an allocation
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 8d8e7cabc27f..56ae668dcdcf 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -59,6 +59,7 @@ int xfs_blockgc_free_dquots(struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
 		bool *found_work);
 int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int eof_flags,
 		bool *found_work);
+int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_eofblocks *eofb);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3f761f33099b..7ec9d4d703a6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3926,6 +3926,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
 		 unsigned long caller_ip), \
 	TP_ARGS(mp, eofb, caller_ip))
 DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+DEFINE_EOFBLOCKS_EVENT(xfs_blockgc_free_space);
 
 #endif /* _TRACE_XFS_H */
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, back to index

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-23 18:51 [PATCHSET v4 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
2021-01-23 18:52 ` [PATCH 01/11] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
2021-01-25 18:13   ` Brian Foster
2021-01-25 19:33     ` Darrick J. Wong
2021-01-23 18:52 ` [PATCH 02/11] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
2021-01-25 18:14   ` Brian Foster
2021-01-25 19:54     ` Darrick J. Wong
2021-01-26 13:14       ` Brian Foster
2021-01-26 18:34         ` Darrick J. Wong
2021-01-26 20:03           ` Brian Foster
2021-01-27  3:09             ` Darrick J. Wong
2021-01-23 18:52 ` [PATCH 03/11] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
2021-01-25 18:14   ` Brian Foster
2021-01-23 18:52 ` [PATCH 04/11] xfs: move and rename xfs_inode_free_quota_blocks to avoid conflicts Darrick J. Wong
2021-01-25 18:14   ` Brian Foster
2021-01-23 18:52 ` [PATCH 05/11] xfs: pass flags and return gc errors from xfs_blockgc_free_quota Darrick J. Wong
2021-01-24  9:34   ` Christoph Hellwig
2021-01-25 18:15   ` Brian Foster
2021-01-26  4:52   ` [PATCH v4.1 " Darrick J. Wong
2021-01-27 16:59     ` Christoph Hellwig
2021-01-27 17:11       ` Darrick J. Wong
2021-01-23 18:52 ` [PATCH 06/11] xfs: flush eof/cowblocks if we can't reserve quota for file blocks Darrick J. Wong
2021-01-24  9:39   ` Christoph Hellwig
2021-01-25 18:16     ` Brian Foster
2021-01-25 18:57       ` Darrick J. Wong
2021-01-26 13:26         ` Brian Foster
2021-01-26 21:12           ` Darrick J. Wong
2021-01-27 14:19             ` Brian Foster
2021-01-27 17:19               ` Darrick J. Wong
2021-01-26  4:53   ` [PATCH v4.1 " Darrick J. Wong
2021-01-23 18:52 ` [PATCH 07/11] xfs: flush eof/cowblocks if we can't reserve quota for inode creation Darrick J. Wong
2021-01-26  4:55   ` [PATCH v4.1 " Darrick J. Wong
2021-01-23 18:52 ` [PATCH 08/11] xfs: flush eof/cowblocks if we can't reserve quota for chown Darrick J. Wong
2021-01-26  4:55   ` [PATCH v4.1 " Darrick J. Wong
2021-01-23 18:52 ` [PATCH 09/11] xfs: add a tracepoint for blockgc scans Darrick J. Wong
2021-01-25 18:45   ` Brian Foster
2021-01-26  4:56   ` [PATCH v4.1 " Darrick J. Wong
2021-01-23 18:52 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
2021-01-24  9:41   ` Christoph Hellwig
2021-01-25 18:46   ` Brian Foster
2021-01-26  2:33     ` Darrick J. Wong
2021-01-23 18:53 ` [PATCH 11/11] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
2021-01-24  9:48   ` Christoph Hellwig
2021-01-25 18:46     ` Brian Foster
2021-01-25 20:02     ` Darrick J. Wong
2021-01-25 21:06       ` Brian Foster
2021-01-26  0:29         ` Darrick J. Wong
2021-01-27 16:57           ` Christoph Hellwig
2021-01-27 21:00             ` Darrick J. Wong
2021-01-26  4:59   ` [PATCH v4.1 " Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2021-01-28  6:02 [PATCHSET v5 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
2021-01-28  6:03 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong
2021-01-18 22:11 [PATCHSET v3 00/11] xfs: try harder to reclaim space when we run out Darrick J. Wong
2021-01-18 22:12 ` [PATCH 10/11] xfs: refactor xfs_icache_free_{eof,cow}blocks call sites Darrick J. Wong

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git