linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out
@ 2021-01-11 23:22 Darrick J. Wong
  2021-01-11 23:22 ` [PATCH 1/6] xfs: hide most of the incore inode walk interface Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:22 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

Historically, when users ran out of space or quota when trying to write
to the filesystem, XFS didn't try very hard to reclaim space that it
might have speculatively allocated for the purpose of speeding up
front-end filesystem operations (appending writes, cow staging).  The
upcoming deferred inactivation series will greatly increase the amount
of allocated space that isn't actively being used to store user data.

Therefore, try to reduce the circumstances where we return EDQUOT or
ENOSPC to userspace by teaching the write paths to try to clear space
and retry the operation one time before giving up.

v2: clean up and rebase against 5.11.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=reclaim-space-harder-5.12
---
 fs/xfs/xfs_bmap_util.c   |   29 ++++++++++
 fs/xfs/xfs_file.c        |   24 +++------
 fs/xfs/xfs_icache.c      |  129 +++++++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_icache.h      |   14 ++---
 fs/xfs/xfs_inode.c       |   29 ++++++++++
 fs/xfs/xfs_ioctl.c       |    2 +
 fs/xfs/xfs_iomap.c       |   33 +++++++++++-
 fs/xfs/xfs_qm_syscalls.c |    3 -
 fs/xfs/xfs_reflink.c     |   63 +++++++++++++++++++++-
 fs/xfs/xfs_trace.c       |    1 
 fs/xfs/xfs_trace.h       |   41 +++++++++++++++
 11 files changed, 303 insertions(+), 65 deletions(-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] xfs: hide most of the incore inode walk interface
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
@ 2021-01-11 23:22 ` Darrick J. Wong
  2021-01-13 14:34   ` Christoph Hellwig
  2021-01-11 23:22 ` [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:22 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Hide the incore inode walk interface because callers outside of the
icache code don't need to know about iter_flags and radix tags and other
implementation details of the incore inode cache.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c      |   27 +++++++++++++++++++++++----
 fs/xfs/xfs_icache.h      |    9 ++-------
 fs/xfs/xfs_qm_syscalls.c |    3 +--
 3 files changed, 26 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index deb99300d171..e6d704a30b29 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -26,6 +26,11 @@
 
 #include <linux/iversion.h>
 
+/*
+ * flags for incore inode iterator
+ */
+#define XFS_INODE_WALK_INEW_WAIT	0x1	/* wait on new inodes */
+
 /*
  * Allocate and initialise an xfs_inode.
  */
@@ -888,8 +893,8 @@ xfs_inode_walk_get_perag(
  * Call the @execute function on all incore inodes matching the radix tree
  * @tag.
  */
-int
-xfs_inode_walk(
+STATIC int
+__xfs_inode_walk(
 	struct xfs_mount	*mp,
 	int			iter_flags,
 	int			(*execute)(struct xfs_inode *ip, void *args),
@@ -915,6 +920,20 @@ xfs_inode_walk(
 	return last_error;
 }
 
+/*
+ * Walk all incore inodes in the filesystem.  Knowledge of radix tree tags
+ * is hidden and we always wait for INEW inodes.
+ */
+int
+xfs_inode_walk(
+	struct xfs_mount	*mp,
+	int			(*execute)(struct xfs_inode *ip, void *args),
+	void			*args)
+{
+	return __xfs_inode_walk(mp, XFS_INODE_WALK_INEW_WAIT, execute, args,
+			XFS_ICI_NO_TAG);
+}
+
 /*
  * Background scanning to trim post-EOF preallocated space. This is queued
  * based on the 'speculative_prealloc_lifetime' tunable (5m by default).
@@ -1392,7 +1411,7 @@ xfs_icache_free_eofblocks(
 	struct xfs_mount	*mp,
 	struct xfs_eofblocks	*eofb)
 {
-	return xfs_inode_walk(mp, 0, xfs_inode_free_eofblocks, eofb,
+	return __xfs_inode_walk(mp, 0, xfs_inode_free_eofblocks, eofb,
 			XFS_ICI_EOFBLOCKS_TAG);
 }
 
@@ -1642,7 +1661,7 @@ xfs_icache_free_cowblocks(
 	struct xfs_mount	*mp,
 	struct xfs_eofblocks	*eofb)
 {
-	return xfs_inode_walk(mp, 0, xfs_inode_free_cowblocks, eofb,
+	return __xfs_inode_walk(mp, 0, xfs_inode_free_cowblocks, eofb,
 			XFS_ICI_COWBLOCKS_TAG);
 }
 
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 3a4c8b382cd0..0d9f1b5d112c 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -34,11 +34,6 @@ struct xfs_eofblocks {
 #define XFS_IGET_DONTCACHE	0x4
 #define XFS_IGET_INCORE		0x8	/* don't read from disk or reinit */
 
-/*
- * flags for AG inode iterator
- */
-#define XFS_INODE_WALK_INEW_WAIT	0x1	/* wait on new inodes */
-
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);
 
@@ -68,9 +63,9 @@ int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
 void xfs_cowblocks_worker(struct work_struct *);
 void xfs_queue_cowblocks(struct xfs_mount *);
 
-int xfs_inode_walk(struct xfs_mount *mp, int iter_flags,
+int xfs_inode_walk(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, void *args),
-	void *args, int tag);
+	void *args);
 
 int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
 				  xfs_ino_t ino, bool *inuse);
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index ca1b57d291dc..dad4d3fc3df3 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -795,6 +795,5 @@ xfs_qm_dqrele_all_inodes(
 	uint			flags)
 {
 	ASSERT(mp->m_quotainfo);
-	xfs_inode_walk(mp, XFS_INODE_WALK_INEW_WAIT, xfs_dqrele_inode,
-			&flags, XFS_ICI_NO_TAG);
+	xfs_inode_walk(mp, xfs_dqrele_inode, &flags);
 }


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
  2021-01-11 23:22 ` [PATCH 1/6] xfs: hide most of the incore inode walk interface Darrick J. Wong
@ 2021-01-11 23:22 ` Darrick J. Wong
  2021-01-13 14:37   ` Christoph Hellwig
  2021-01-11 23:22 ` [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:22 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The functions to run an eof/cowblocks scan to try to reduce quota usage
are kind of a mess -- the logic repeatedly initializes an eofb structure
and there are logic bugs in the code that result in the cowblocks scan
never actually happening.

Replace all three functions with a single function that fills out an
eofb if we're low on quota and runs both eof and cowblocks scans.

Fixes: 83104d449e8c4 ("xfs: garbage collect old cowextsz reservations")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c   |   15 ++++++---------
 fs/xfs/xfs_icache.c |   46 ++++++++++++++++------------------------------
 fs/xfs/xfs_icache.h |    4 ++--
 3 files changed, 24 insertions(+), 41 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..5d5cf25668b5 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -652,7 +652,7 @@ xfs_file_buffered_aio_write(
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
-	int			enospc = 0;
+	bool			cleared_space = false;
 	int			iolock;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
@@ -684,19 +684,16 @@ xfs_file_buffered_aio_write(
 	 * also behaves as a filter to prevent too many eofblocks scans from
 	 * running at the same time.
 	 */
-	if (ret == -EDQUOT && !enospc) {
+	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		enospc = xfs_inode_free_quota_eofblocks(ip);
-		if (enospc)
-			goto write_retry;
-		enospc = xfs_inode_free_quota_cowblocks(ip);
-		if (enospc)
+		cleared_space = xfs_inode_free_quota_blocks(ip);
+		if (cleared_space)
 			goto write_retry;
 		iolock = 0;
-	} else if (ret == -ENOSPC && !enospc) {
+	} else if (ret == -ENOSPC && !cleared_space) {
 		struct xfs_eofblocks eofb = {0};
 
-		enospc = 1;
+		cleared_space = true;
 		xfs_flush_inodes(ip->i_mount);
 
 		xfs_iunlock(ip, iolock);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index e6d704a30b29..66af84c578b5 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1416,33 +1416,31 @@ xfs_icache_free_eofblocks(
 }
 
 /*
- * Run eofblocks scans on the quotas applicable to the inode. For inodes with
- * multiple quotas, we don't know exactly which quota caused an allocation
+ * Run cow/eofblocks scans on the quotas applicable to the inode. For inodes
+ * with multiple quotas, we don't know exactly which quota caused an allocation
  * failure. We make a best effort by including each quota under low free space
  * conditions (less than 1% free space) in the scan.
  */
-static int
-__xfs_inode_free_quota_eofblocks(
-	struct xfs_inode	*ip,
-	int			(*execute)(struct xfs_mount *mp,
-					   struct xfs_eofblocks	*eofb))
+bool
+xfs_inode_free_quota_blocks(
+	struct xfs_inode	*ip)
 {
-	int scan = 0;
-	struct xfs_eofblocks eofb = {0};
-	struct xfs_dquot *dq;
+	struct xfs_eofblocks	eofb = {0};
+	struct xfs_dquot	*dq;
+	bool			do_work = false;
 
 	/*
 	 * Run a sync scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC;
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
 		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
 		if (dq && xfs_dquot_lowsp(dq)) {
 			eofb.eof_uid = VFS_I(ip)->i_uid;
 			eofb.eof_flags |= XFS_EOF_FLAGS_UID;
-			scan = 1;
+			do_work = true;
 		}
 	}
 
@@ -1451,21 +1449,16 @@ __xfs_inode_free_quota_eofblocks(
 		if (dq && xfs_dquot_lowsp(dq)) {
 			eofb.eof_gid = VFS_I(ip)->i_gid;
 			eofb.eof_flags |= XFS_EOF_FLAGS_GID;
-			scan = 1;
+			do_work = true;
 		}
 	}
 
-	if (scan)
-		execute(ip->i_mount, &eofb);
+	if (!do_work)
+		return false;
 
-	return scan;
-}
-
-int
-xfs_inode_free_quota_eofblocks(
-	struct xfs_inode *ip)
-{
-	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
+	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
+	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+	return true;
 }
 
 static inline unsigned long
@@ -1665,13 +1658,6 @@ xfs_icache_free_cowblocks(
 			XFS_ICI_COWBLOCKS_TAG);
 }
 
-int
-xfs_inode_free_quota_cowblocks(
-	struct xfs_inode *ip)
-{
-	return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_cowblocks);
-}
-
 void
 xfs_inode_set_cowblocks_tag(
 	xfs_inode_t	*ip)
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 0d9f1b5d112c..b94a94e9a7a6 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -49,17 +49,17 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
+bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
+
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
-int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
 void xfs_eofblocks_worker(struct work_struct *);
 void xfs_queue_eofblocks(struct xfs_mount *);
 
 void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
-int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
 void xfs_cowblocks_worker(struct work_struct *);
 void xfs_queue_cowblocks(struct xfs_mount *);
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
  2021-01-11 23:22 ` [PATCH 1/6] xfs: hide most of the incore inode walk interface Darrick J. Wong
  2021-01-11 23:22 ` [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
@ 2021-01-11 23:22 ` Darrick J. Wong
  2021-01-13 14:43   ` Christoph Hellwig
  2021-01-11 23:22 ` [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:22 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't stall the cowblocks scan on a locked inode if we possibly can.
We'd much rather the background scanner keep moving.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 66af84c578b5..4e226827b33d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1624,17 +1624,31 @@ xfs_inode_free_cowblocks(
 	void			*args)
 {
 	struct xfs_eofblocks	*eofb = args;
+	bool			wait;
 	int			ret = 0;
 
+	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
+
 	if (!xfs_prep_free_cowblocks(ip))
 		return 0;
 
 	if (!xfs_inode_matches_eofb(ip, eofb))
 		return 0;
 
-	/* Free the CoW blocks */
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
+	/*
+	 * If the caller is waiting, return -EAGAIN to keep the background
+	 * scanner moving and revisit the inode in a subsequent pass.
+	 */
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+		if (wait)
+			return -EAGAIN;
+		return 0;
+	}
+	if (!xfs_ilock_nowait(ip, XFS_MMAPLOCK_EXCL)) {
+		if (wait)
+			ret = -EAGAIN;
+		goto out_iolock;
+	}
 
 	/*
 	 * Check again, nobody else should be able to dirty blocks or change
@@ -1644,6 +1658,7 @@ xfs_inode_free_cowblocks(
 		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
 
 	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+out_iolock:
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 	return ret;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-01-11 23:22 ` [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
@ 2021-01-11 23:22 ` Darrick J. Wong
  2021-01-13 14:45   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 6/6] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
  5 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:22 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Buffered writers who have run out of quota reservation call
xfs_inode_free_quota_blocks to try to free any space reservations that
might reduce the quota usage.  Unfortunately, the buffered write path
treats "out of project quota" the same as "out of overall space" so this
function has never supported scanning for space that might ease an "out
of project quota" condition.

We're about to start using this function for cases where we actually
/can/ tell if we're out of project quota, so add in this functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |    9 +++++++++
 1 file changed, 9 insertions(+)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 4e226827b33d..703d26d04e0f 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1453,6 +1453,15 @@ xfs_inode_free_quota_blocks(
 		}
 	}
 
+	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
+		dq = xfs_inode_dquot(ip, XFS_DQTYPE_PROJ);
+		if (dq && xfs_dquot_lowsp(dq)) {
+			eofb.eof_prid = ip->i_d.di_projid;
+			eofb.eof_flags |= XFS_EOF_FLAGS_PRID;
+			do_work = true;
+		}
+	}
+
 	if (!do_work)
 		return false;
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-01-11 23:22 ` [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-12  1:22   ` Dave Chinner
  2021-01-11 23:23 ` [PATCH 6/6] xfs: flush speculative space allocations when we run out of space Darrick J. Wong
  5 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (creation, file write, reflink, etc.) is unable to
reserve enough quota to handle the modification, try clearing whatever
space the filesystem might have been hanging onto in the hopes of
speeding up the filesystem.  The flushing behavior will become
particularly important when we add deferred inode inactivation because
that will increase the amount of space that isn't actively tied to user
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
 fs/xfs/xfs_file.c      |    2 +-
 fs/xfs/xfs_icache.c    |    9 +++++++--
 fs/xfs/xfs_icache.h    |    2 +-
 fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
 fs/xfs/xfs_ioctl.c     |    2 ++
 fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
 fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.c     |    1 +
 fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
 10 files changed, 141 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 7371a7f7c652..437fdc8a8fbd 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -761,6 +761,7 @@ xfs_alloc_file_space(
 	 */
 	while (allocatesize_fsb && !error) {
 		xfs_fileoff_t	s, e;
+		bool		cleared_space = false;
 
 		/*
 		 * Determine space reservations for data/realtime.
@@ -803,6 +804,7 @@ xfs_alloc_file_space(
 		/*
 		 * Allocate and setup the transaction.
 		 */
+retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
 				resrtextents, 0, &tp);
 
@@ -819,6 +821,20 @@ xfs_alloc_file_space(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
 						      0, quota_flag);
+		/*
+		 * We weren't able to reserve enough quota to handle fallocate.
+		 * Flush any disk space that was being held in the hopes of
+		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
+		 * do a synchronous scan.
+		 */
+		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
+			xfs_trans_cancel(tp);
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			cleared_space = xfs_inode_free_quota_blocks(ip, false);
+			if (cleared_space)
+				goto retry;
+			return error;
+		}
 		if (error)
 			goto error1;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5d5cf25668b5..136fb5972acd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -686,7 +686,7 @@ xfs_file_buffered_aio_write(
 	 */
 	if (ret == -EDQUOT && !cleared_space) {
 		xfs_iunlock(ip, iolock);
-		cleared_space = xfs_inode_free_quota_blocks(ip);
+		cleared_space = xfs_inode_free_quota_blocks(ip, true);
 		if (cleared_space)
 			goto write_retry;
 		iolock = 0;
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 703d26d04e0f..d6f7c3e85805 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1423,7 +1423,8 @@ xfs_icache_free_eofblocks(
  */
 bool
 xfs_inode_free_quota_blocks(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	bool			sync)
 {
 	struct xfs_eofblocks	eofb = {0};
 	struct xfs_dquot	*dq;
@@ -1433,7 +1434,9 @@ xfs_inode_free_quota_blocks(
 	 * Run a sync scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
-	eofb.eof_flags = XFS_EOF_FLAGS_UNION | XFS_EOF_FLAGS_SYNC;
+	eofb.eof_flags = XFS_EOF_FLAGS_UNION;
+	if (sync)
+		eofb.eof_flags |= XFS_EOF_FLAGS_SYNC;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
 		dq = xfs_inode_dquot(ip, XFS_DQTYPE_USER);
@@ -1465,6 +1468,8 @@ xfs_inode_free_quota_blocks(
 	if (!do_work)
 		return false;
 
+	trace_xfs_inode_free_quota_blocks(ip->i_mount, &eofb, _RET_IP_);
+
 	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
 	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
 	return true;
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index b94a94e9a7a6..88bbbc9f00f8 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -49,7 +49,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
-bool xfs_inode_free_quota_blocks(struct xfs_inode *ip);
+bool xfs_inode_free_quota_blocks(struct xfs_inode *ip, bool sync);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e5dc41b10ebb..09d97cf81f6d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -990,6 +990,7 @@ xfs_create(
 	struct xfs_dquot	*gdqp = NULL;
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_trans_res	*tres;
+	bool			cleared_space = false;
 	uint			resblks;
 
 	trace_xfs_create(dp, name);
@@ -1022,6 +1023,7 @@ xfs_create(
 	 * the case we'll drop the one we have and get a more
 	 * appropriate transaction later.
 	 */
+retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
 	if (error == -ENOSPC) {
 		/* flush outstanding delalloc blocks and retry */
@@ -1039,6 +1041,21 @@ xfs_create(
 	 */
 	error = xfs_trans_reserve_quota(tp, mp, udqp, gdqp,
 						pdqp, resblks, 1, 0);
+	/*
+	 * We weren't able to reserve enough quota to handle adding the inode.
+	 * Flush any disk space that was being held in the hopes of speeding up
+	 * the filesystem.
+	 */
+	if ((error == -EDQUOT || error == -ENOSPC) && !cleared_space) {
+		xfs_trans_cancel(tp);
+		if (unlock_dp_on_error)
+			xfs_iunlock(dp, XFS_ILOCK_EXCL);
+		unlock_dp_on_error = false;
+		cleared_space = xfs_inode_free_quota_blocks(dp, true);
+		if (cleared_space)
+			goto retry;
+		goto out_release_inode;
+	}
 	if (error)
 		goto out_trans_cancel;
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f61ea5..97e5b192d559 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -2348,6 +2348,8 @@ xfs_file_ioctl(
 		if (error)
 			return error;
 
+		trace_xfs_ioc_free_eofblocks(mp, &keofb, _RET_IP_);
+
 		sb_start_write(mp->m_super);
 		error = xfs_icache_free_eofblocks(mp, &keofb);
 		sb_end_write(mp->m_super);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7b9ff824e82d..d83ee1406cea 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -27,7 +27,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
-
+#include "xfs_icache.h"
 
 #define XFS_ALLOC_ALIGN(mp, off) \
 	(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -200,6 +200,7 @@ xfs_iomap_write_direct(
 	int			error;
 	int			bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint			tflags = 0;
+	bool			cleared_space = false;
 
 	ASSERT(count_fsb > 0);
 
@@ -239,6 +240,7 @@ xfs_iomap_write_direct(
 			resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
 		}
 	}
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, resrtextents,
 			tflags, &tp);
 	if (error)
@@ -247,6 +249,22 @@ xfs_iomap_write_direct(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
 	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);
+	/*
+	 * We weren't able to reserve enough quota for the direct write.
+	 * Flush any disk space that was being held in the hopes of speeding up
+	 * the filesystem.  Historically, we expected callers to have
+	 * preallocated all the space before a direct write, but this is not an
+	 * absolute requirement.  We still hold the IOLOCK so we cannot do a
+	 * sync scan.
+	 */
+	if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
+		xfs_trans_cancel(tp);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		cleared_space = xfs_inode_free_quota_blocks(ip, false);
+		if (cleared_space)
+			goto retry;
+		return error;
+	}
 	if (error)
 		goto out_trans_cancel;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6fa05fb78189..7be3cd3ee9bf 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -351,13 +351,14 @@ xfs_reflink_allocate_cow(
 	bool			convert_now)
 {
 	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
 	xfs_fileoff_t		offset_fsb = imap->br_startoff;
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
-	struct xfs_trans	*tp;
-	int			nimaps, error = 0;
-	bool			found;
 	xfs_filblks_t		resaligned;
 	xfs_extlen_t		resblks = 0;
+	bool			found;
+	bool			cleared_space = false;
+	int			nimaps, error = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (!ip->i_cowfp) {
@@ -376,6 +377,7 @@ xfs_reflink_allocate_cow(
 	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
 
 	xfs_iunlock(ip, *lockmode);
+retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	*lockmode = XFS_ILOCK_EXCL;
 	xfs_ilock(ip, *lockmode);
@@ -400,6 +402,23 @@ xfs_reflink_allocate_cow(
 
 	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
 			XFS_QMOPT_RES_REGBLKS);
+	/*
+	 * We weren't able to reserve enough quota to handle copy on write.
+	 * Flush any disk space that was being held in the hopes of speeding up
+	 * the filesystem.  We potentially hold the IOLOCK so we cannot do a
+	 * synchronous scan.
+	 */
+	if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
+		xfs_trans_cancel(tp);
+		xfs_iunlock(ip, *lockmode);
+		*lockmode = 0;
+		cleared_space = xfs_inode_free_quota_blocks(ip, false);
+		if (cleared_space)
+			goto retry;
+		*lockmode = XFS_ILOCK_EXCL;
+		xfs_ilock(ip, *lockmode);
+		return error;
+	}
 	if (error)
 		goto out_trans_cancel;
 
@@ -1001,9 +1020,11 @@ xfs_reflink_remap_extent(
 	unsigned int		resblks;
 	bool			smap_real;
 	bool			dmap_written = xfs_bmap_is_written_extent(dmap);
+	bool			cleared_space = false;
 	int			nimaps;
 	int			error;
 
+retry:
 	/* Start a rolling transaction to switch the mappings */
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
@@ -1081,6 +1102,11 @@ xfs_reflink_remap_extent(
 	 * count.  This is suboptimal, but the VFS flushed the dest range
 	 * before we started.  That should have removed all the delalloc
 	 * reservations, but we code defensively.
+	 *
+	 * If we fail with ENOSPC or EDQUOT, we weren't able to reserve enough
+	 * quota for the remapping.  Flush any disk space that was being held
+	 * in the hopes of speeding up the filesystem.  We still hold the
+	 * IOLOCK so we cannot do a sync scan.
 	 */
 	qres = qdelta = 0;
 	if (smap_real || dmap_written)
@@ -1090,6 +1116,14 @@ xfs_reflink_remap_extent(
 	if (qres > 0) {
 		error = xfs_trans_reserve_quota_nblks(tp, ip, qres, 0,
 				XFS_QMOPT_RES_REGBLKS);
+		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
+			xfs_trans_cancel(tp);
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			cleared_space = xfs_inode_free_quota_blocks(ip, false);
+			if (cleared_space)
+				goto retry;
+			goto out;
+		}
 		if (error)
 			goto out_cancel;
 	}
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index 120398a37c2a..9b8d703dc9fd 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -29,6 +29,7 @@
 #include "xfs_filestream.h"
 #include "xfs_fsmap.h"
 #include "xfs_btree_staging.h"
+#include "xfs_icache.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 5a263ae3d4f0..7b0dbbf6f4cc 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -37,6 +37,7 @@ struct xfs_trans_res;
 struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
+struct xfs_eofblocks;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -3888,6 +3889,45 @@ DEFINE_EVENT(xfs_timestamp_range_class, name, \
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_inode_timestamp_range);
 DEFINE_TIMESTAMP_RANGE_EVENT(xfs_quota_expiry_range);
 
+DECLARE_EVENT_CLASS(xfs_eofblocks_class,
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, eofb, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(__u32, flags)
+		__field(uint32_t, uid)
+		__field(uint32_t, gid)
+		__field(prid_t, prid)
+		__field(__u64, min_file_size)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->flags = eofb->eof_flags;
+		__entry->uid = from_kuid(mp->m_super->s_user_ns, eofb->eof_uid);
+		__entry->gid = from_kgid(mp->m_super->s_user_ns, eofb->eof_gid);
+		__entry->prid = eofb->eof_prid;
+		__entry->min_file_size = eofb->eof_min_file_size;
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d flags 0x%x uid %u gid %u prid %u minsize %llu caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->flags,
+		  __entry->uid,
+		  __entry->gid,
+		  __entry->prid,
+		  __entry->min_file_size,
+		  (char *)__entry->caller_ip)
+);
+#define DEFINE_EOFBLOCKS_EVENT(name)	\
+DEFINE_EVENT(xfs_eofblocks_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, struct xfs_eofblocks *eofb, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, eofb, caller_ip))
+DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
+DEFINE_EOFBLOCKS_EVENT(xfs_inode_free_quota_blocks);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] xfs: flush speculative space allocations when we run out of space
  2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-01-11 23:23 ` [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  5 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If a fs modification (creation, file write, reflink, etc.) is unable to
reserve enough space to handle the modification, try clearing whatever
space the filesystem might have been hanging onto in the hopes of
speeding up the filesystem.  The flushing behavior will become
particularly important when we add deferred inode inactivation because
that will increase the amount of space that isn't actively tied to user
data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_bmap_util.c |   13 ++++++++++++-
 fs/xfs/xfs_file.c      |   11 ++++-------
 fs/xfs/xfs_icache.c    |   37 ++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_icache.h    |    1 +
 fs/xfs/xfs_inode.c     |   12 ++++++++++--
 fs/xfs/xfs_iomap.c     |   13 +++++++++++++
 fs/xfs/xfs_reflink.c   |   23 +++++++++++++++++++++++
 fs/xfs/xfs_trace.h     |    1 +
 8 files changed, 98 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 437fdc8a8fbd..602011f06fb6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -807,7 +807,18 @@ xfs_alloc_file_space(
 retry:
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
 				resrtextents, 0, &tp);
-
+		/*
+		 * We weren't able to reserve enough space to handle fallocate.
+		 * Flush any disk space that was being held in the hopes of
+		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
+		 * do a synchronous scan.
+		 */
+		if (error == -ENOSPC && !cleared_space) {
+			cleared_space = true;
+			error = xfs_inode_free_blocks(ip->i_mount, false);
+			if (!error)
+				goto retry;
+		}
 		/*
 		 * Check for running out of space
 		 */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 136fb5972acd..1333c65dd2b5 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -691,15 +691,12 @@ xfs_file_buffered_aio_write(
 			goto write_retry;
 		iolock = 0;
 	} else if (ret == -ENOSPC && !cleared_space) {
-		struct xfs_eofblocks eofb = {0};
-
-		cleared_space = true;
 		xfs_flush_inodes(ip->i_mount);
-
 		xfs_iunlock(ip, iolock);
-		eofb.eof_flags = XFS_EOF_FLAGS_SYNC;
-		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-		xfs_icache_free_cowblocks(ip->i_mount, &eofb);
+		cleared_space = true;
+		ret = xfs_inode_free_blocks(ip->i_mount, true);
+		if (ret)
+			return ret;
 		goto write_retry;
 	}
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index d6f7c3e85805..c3867b25e362 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -950,6 +950,21 @@ xfs_queue_eofblocks(
 	rcu_read_unlock();
 }
 
+/* Scan all incore inodes for block preallocations that we can remove. */
+static inline int
+xfs_blockgc_scan(
+	struct xfs_mount	*mp,
+	struct xfs_eofblocks	*eofb)
+{
+	int			error;
+
+	error = xfs_icache_free_eofblocks(mp, eofb);
+	if (error && error != -EAGAIN)
+		return error;
+
+	return xfs_icache_free_cowblocks(mp, eofb);
+}
+
 void
 xfs_eofblocks_worker(
 	struct work_struct *work)
@@ -1470,9 +1485,25 @@ xfs_inode_free_quota_blocks(
 
 	trace_xfs_inode_free_quota_blocks(ip->i_mount, &eofb, _RET_IP_);
 
-	xfs_icache_free_eofblocks(ip->i_mount, &eofb);
-	xfs_icache_free_cowblocks(ip->i_mount, &eofb);
-	return true;
+	return xfs_blockgc_scan(ip->i_mount, &eofb) == 0;
+}
+
+/*
+ * Try to free space in the filesystem by purging eofblocks and cowblocks.
+ */
+int
+xfs_inode_free_blocks(
+	struct xfs_mount	*mp,
+	bool			sync)
+{
+	struct xfs_eofblocks	eofb = {0};
+
+	if (sync)
+		eofb.eof_flags |= XFS_EOF_FLAGS_SYNC;
+
+	trace_xfs_inode_free_blocks(mp, &eofb, _RET_IP_);
+
+	return xfs_blockgc_scan(mp, &eofb);
 }
 
 static inline unsigned long
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 88bbbc9f00f8..2f230d273a54 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -50,6 +50,7 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 
 bool xfs_inode_free_quota_blocks(struct xfs_inode *ip, bool sync);
+int xfs_inode_free_blocks(struct xfs_mount *mp, bool sync);
 
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 09d97cf81f6d..0e6cc33a33ad 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1025,10 +1025,18 @@ xfs_create(
 	 */
 retry:
 	error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
-	if (error == -ENOSPC) {
+	/*
+	 * We weren't able to reserve enough space to add the inode.  Flush
+	 * any disk space that was being held in the hopes of speeding up the
+	 * filesystem.
+	 */
+	if (error == -ENOSPC && !cleared_space) {
+		cleared_space = true;
 		/* flush outstanding delalloc blocks and retry */
 		xfs_flush_inodes(mp);
-		error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
+		error = xfs_inode_free_blocks(mp, true);
+		if (!error)
+			goto retry;
 	}
 	if (error)
 		goto out_release_inode;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d83ee1406cea..cf651c70f22c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -243,6 +243,19 @@ xfs_iomap_write_direct(
 retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, resrtextents,
 			tflags, &tp);
+	/*
+	 * We weren't able to reserve enough space for the direct write.  Flush
+	 * any disk space that was being held in the hopes of speeding up the
+	 * filesystem.  Historically, we expected callers to have preallocated
+	 * all the space before a direct write, but this is not an absolute
+	 * requirement.  We still hold the IOLOCK so we cannot do a sync scan.
+	 */
+	if (error == -ENOSPC && !cleared_space) {
+		cleared_space = true;
+		error = xfs_inode_free_blocks(ip->i_mount, false);
+		if (!error)
+			goto retry;
+	}
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7be3cd3ee9bf..749d425603ca 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -379,6 +379,18 @@ xfs_reflink_allocate_cow(
 	xfs_iunlock(ip, *lockmode);
 retry:
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
+	/*
+	 * We weren't able to reserve enough space to handle copy on write.
+	 * Flush any disk space that was being held in the hopes of speeding up
+	 * the filesystem.  We potentially hold the IOLOCK so we cannot do a
+	 * synchronous scan.
+	 */
+	if (error == -ENOSPC && !cleared_space) {
+		cleared_space = true;
+		error = xfs_inode_free_blocks(ip->i_mount, false);
+		if (!error)
+			goto retry;
+	}
 	*lockmode = XFS_ILOCK_EXCL;
 	xfs_ilock(ip, *lockmode);
 
@@ -1028,6 +1040,17 @@ xfs_reflink_remap_extent(
 	/* Start a rolling transaction to switch the mappings */
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
+	/*
+	 * We weren't able to reserve enough space for the remapping.  Flush
+	 * any disk space that was being held in the hopes of speeding up the
+	 * filesystem.  We still hold the IOLOCK so we cannot do a sync scan.
+	 */
+	if (error == -ENOSPC && !cleared_space) {
+		cleared_space = true;
+		error = xfs_inode_free_blocks(mp, false);
+		if (!error)
+			goto retry;
+	}
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 7b0dbbf6f4cc..9a29a5e18711 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3927,6 +3927,7 @@ DEFINE_EVENT(xfs_eofblocks_class, name,	\
 	TP_ARGS(mp, eofb, caller_ip))
 DEFINE_EOFBLOCKS_EVENT(xfs_ioc_free_eofblocks);
 DEFINE_EOFBLOCKS_EVENT(xfs_inode_free_quota_blocks);
+DEFINE_EOFBLOCKS_EVENT(xfs_inode_free_blocks);
 
 #endif /* _TRACE_XFS_H */
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-11 23:23 ` [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota Darrick J. Wong
@ 2021-01-12  1:22   ` Dave Chinner
  2021-01-12  1:31     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2021-01-12  1:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If a fs modification (creation, file write, reflink, etc.) is unable to
> reserve enough quota to handle the modification, try clearing whatever
> space the filesystem might have been hanging onto in the hopes of
> speeding up the filesystem.  The flushing behavior will become
> particularly important when we add deferred inode inactivation because
> that will increase the amount of space that isn't actively tied to user
> data.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
>  fs/xfs/xfs_file.c      |    2 +-
>  fs/xfs/xfs_icache.c    |    9 +++++++--
>  fs/xfs/xfs_icache.h    |    2 +-
>  fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
>  fs/xfs/xfs_ioctl.c     |    2 ++
>  fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
>  fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_trace.c     |    1 +
>  fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
>  10 files changed, 141 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 7371a7f7c652..437fdc8a8fbd 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -761,6 +761,7 @@ xfs_alloc_file_space(
>  	 */
>  	while (allocatesize_fsb && !error) {
>  		xfs_fileoff_t	s, e;
> +		bool		cleared_space = false;
>  
>  		/*
>  		 * Determine space reservations for data/realtime.
> @@ -803,6 +804,7 @@ xfs_alloc_file_space(
>  		/*
>  		 * Allocate and setup the transaction.
>  		 */
> +retry:
>  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
>  				resrtextents, 0, &tp);
>  
> @@ -819,6 +821,20 @@ xfs_alloc_file_space(
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
>  						      0, quota_flag);
> +		/*
> +		 * We weren't able to reserve enough quota to handle fallocate.
> +		 * Flush any disk space that was being held in the hopes of
> +		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
> +		 * do a synchronous scan.
> +		 */
> +		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
> +			xfs_trans_cancel(tp);
> +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +			cleared_space = xfs_inode_free_quota_blocks(ip, false);
> +			if (cleared_space)
> +				goto retry;
> +			return error;

Can't say I'm a fan of repeating this everywhere.  Can we move this
into xfs_trans_reserve_quota_nblks() with a "retry" flag such that
we do:

	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
					quota_flag, &retry);
	if (error) {
		/* tp already cancelled, inode unlocked */
		return error;
	}
	if (retry) {
		/* tp already cancelled, inode unlocked */
		goto retry;
	}

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-12  1:22   ` Dave Chinner
@ 2021-01-12  1:31     ` Darrick J. Wong
  2021-01-12  1:40       ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-12  1:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jan 12, 2021 at 12:22:49PM +1100, Dave Chinner wrote:
> On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If a fs modification (creation, file write, reflink, etc.) is unable to
> > reserve enough quota to handle the modification, try clearing whatever
> > space the filesystem might have been hanging onto in the hopes of
> > speeding up the filesystem.  The flushing behavior will become
> > particularly important when we add deferred inode inactivation because
> > that will increase the amount of space that isn't actively tied to user
> > data.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
> >  fs/xfs/xfs_file.c      |    2 +-
> >  fs/xfs/xfs_icache.c    |    9 +++++++--
> >  fs/xfs/xfs_icache.h    |    2 +-
> >  fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
> >  fs/xfs/xfs_ioctl.c     |    2 ++
> >  fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
> >  fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
> >  fs/xfs/xfs_trace.c     |    1 +
> >  fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
> >  10 files changed, 141 insertions(+), 8 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 7371a7f7c652..437fdc8a8fbd 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -761,6 +761,7 @@ xfs_alloc_file_space(
> >  	 */
> >  	while (allocatesize_fsb && !error) {
> >  		xfs_fileoff_t	s, e;
> > +		bool		cleared_space = false;
> >  
> >  		/*
> >  		 * Determine space reservations for data/realtime.
> > @@ -803,6 +804,7 @@ xfs_alloc_file_space(
> >  		/*
> >  		 * Allocate and setup the transaction.
> >  		 */
> > +retry:
> >  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
> >  				resrtextents, 0, &tp);
> >  
> > @@ -819,6 +821,20 @@ xfs_alloc_file_space(
> >  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
> >  						      0, quota_flag);
> > +		/*
> > +		 * We weren't able to reserve enough quota to handle fallocate.
> > +		 * Flush any disk space that was being held in the hopes of
> > +		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
> > +		 * do a synchronous scan.
> > +		 */
> > +		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
> > +			xfs_trans_cancel(tp);
> > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +			cleared_space = xfs_inode_free_quota_blocks(ip, false);
> > +			if (cleared_space)
> > +				goto retry;
> > +			return error;
> 
> Can't say I'm a fan of repeating this everywhere.  Can we move this
> into xfs_trans_reserve_quota_nblks() with a "retry" flag such that
> we do:
> 
> 	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
> 					quota_flag, &retry);
> 	if (error) {
> 		/* tp already cancelled, inode unlocked */
> 		return error;
> 	}
> 	if (retry) {
> 		/* tp already cancelled, inode unlocked */
> 		goto retry;
> 	}

Assuming you'd be interested in the same kind of change being applied to
the next patch (kick the inode reclaim from xfs_trans_reserve on ENOSPC)
then yes, that seems like a nice cleanup.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-12  1:31     ` Darrick J. Wong
@ 2021-01-12  1:40       ` Dave Chinner
  2021-01-12  2:18         ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2021-01-12  1:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 05:31:26PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 12, 2021 at 12:22:49PM +1100, Dave Chinner wrote:
> > On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > If a fs modification (creation, file write, reflink, etc.) is unable to
> > > reserve enough quota to handle the modification, try clearing whatever
> > > space the filesystem might have been hanging onto in the hopes of
> > > speeding up the filesystem.  The flushing behavior will become
> > > particularly important when we add deferred inode inactivation because
> > > that will increase the amount of space that isn't actively tied to user
> > > data.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
> > >  fs/xfs/xfs_file.c      |    2 +-
> > >  fs/xfs/xfs_icache.c    |    9 +++++++--
> > >  fs/xfs/xfs_icache.h    |    2 +-
> > >  fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
> > >  fs/xfs/xfs_ioctl.c     |    2 ++
> > >  fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
> > >  fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
> > >  fs/xfs/xfs_trace.c     |    1 +
> > >  fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
> > >  10 files changed, 141 insertions(+), 8 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index 7371a7f7c652..437fdc8a8fbd 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -761,6 +761,7 @@ xfs_alloc_file_space(
> > >  	 */
> > >  	while (allocatesize_fsb && !error) {
> > >  		xfs_fileoff_t	s, e;
> > > +		bool		cleared_space = false;
> > >  
> > >  		/*
> > >  		 * Determine space reservations for data/realtime.
> > > @@ -803,6 +804,7 @@ xfs_alloc_file_space(
> > >  		/*
> > >  		 * Allocate and setup the transaction.
> > >  		 */
> > > +retry:
> > >  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
> > >  				resrtextents, 0, &tp);
> > >  
> > > @@ -819,6 +821,20 @@ xfs_alloc_file_space(
> > >  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > >  		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
> > >  						      0, quota_flag);
> > > +		/*
> > > +		 * We weren't able to reserve enough quota to handle fallocate.
> > > +		 * Flush any disk space that was being held in the hopes of
> > > +		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
> > > +		 * do a synchronous scan.
> > > +		 */
> > > +		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
> > > +			xfs_trans_cancel(tp);
> > > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +			cleared_space = xfs_inode_free_quota_blocks(ip, false);
> > > +			if (cleared_space)
> > > +				goto retry;
> > > +			return error;
> > 
> > Can't say I'm a fan of repeating this everywhere.  Can we move this
> > into xfs_trans_reserve_quota_nblks() with a "retry" flag such that
> > we do:
> > 
> > 	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
> > 					quota_flag, &retry);
> > 	if (error) {
> > 		/* tp already cancelled, inode unlocked */
> > 		return error;
> > 	}
> > 	if (retry) {
> > 		/* tp already cancelled, inode unlocked */
> > 		goto retry;
> > 	}
> 
> Assuming you'd be interested in the same kind of change being applied to
> the next patch (kick the inode reclaim from xfs_trans_reserve on ENOSPC)
> then yes, that seems like a nice cleanup.

*nod*

It definitely seems to me to be cleaner to put all this GC stuff
into the transaction setup code that does the actual space
reservation, and then simply have the code that is setting up the
transactions handle failures appropriately.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-12  1:40       ` Dave Chinner
@ 2021-01-12  2:18         ` Darrick J. Wong
  2021-01-14 22:10           ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-12  2:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jan 12, 2021 at 12:40:45PM +1100, Dave Chinner wrote:
> On Mon, Jan 11, 2021 at 05:31:26PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 12, 2021 at 12:22:49PM +1100, Dave Chinner wrote:
> > > On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > If a fs modification (creation, file write, reflink, etc.) is unable to
> > > > reserve enough quota to handle the modification, try clearing whatever
> > > > space the filesystem might have been hanging onto in the hopes of
> > > > speeding up the filesystem.  The flushing behavior will become
> > > > particularly important when we add deferred inode inactivation because
> > > > that will increase the amount of space that isn't actively tied to user
> > > > data.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
> > > >  fs/xfs/xfs_file.c      |    2 +-
> > > >  fs/xfs/xfs_icache.c    |    9 +++++++--
> > > >  fs/xfs/xfs_icache.h    |    2 +-
> > > >  fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
> > > >  fs/xfs/xfs_ioctl.c     |    2 ++
> > > >  fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
> > > >  fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
> > > >  fs/xfs/xfs_trace.c     |    1 +
> > > >  fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
> > > >  10 files changed, 141 insertions(+), 8 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > index 7371a7f7c652..437fdc8a8fbd 100644
> > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > @@ -761,6 +761,7 @@ xfs_alloc_file_space(
> > > >  	 */
> > > >  	while (allocatesize_fsb && !error) {
> > > >  		xfs_fileoff_t	s, e;
> > > > +		bool		cleared_space = false;
> > > >  
> > > >  		/*
> > > >  		 * Determine space reservations for data/realtime.
> > > > @@ -803,6 +804,7 @@ xfs_alloc_file_space(
> > > >  		/*
> > > >  		 * Allocate and setup the transaction.
> > > >  		 */
> > > > +retry:
> > > >  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
> > > >  				resrtextents, 0, &tp);
> > > >  
> > > > @@ -819,6 +821,20 @@ xfs_alloc_file_space(
> > > >  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > >  		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
> > > >  						      0, quota_flag);
> > > > +		/*
> > > > +		 * We weren't able to reserve enough quota to handle fallocate.
> > > > +		 * Flush any disk space that was being held in the hopes of
> > > > +		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
> > > > +		 * do a synchronous scan.
> > > > +		 */
> > > > +		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
> > > > +			xfs_trans_cancel(tp);
> > > > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > +			cleared_space = xfs_inode_free_quota_blocks(ip, false);
> > > > +			if (cleared_space)
> > > > +				goto retry;
> > > > +			return error;
> > > 
> > > Can't say I'm a fan of repeating this everywhere.  Can we move this
> > > into xfs_trans_reserve_quota_nblks() with a "retry" flag such that
> > > we do:
> > > 
> > > 	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
> > > 					quota_flag, &retry);
> > > 	if (error) {
> > > 		/* tp already cancelled, inode unlocked */
> > > 		return error;
> > > 	}
> > > 	if (retry) {
> > > 		/* tp already cancelled, inode unlocked */
> > > 		goto retry;
> > > 	}
> > 
> > Assuming you'd be interested in the same kind of change being applied to
> > the next patch (kick the inode reclaim from xfs_trans_reserve on ENOSPC)
> > then yes, that seems like a nice cleanup.
> 
> *nod*
> 
> It definitely seems to me to be cleaner to put all this GC stuff
> into the transaction setup code that does the actual space
> reservation, and then simply have the code that is setting up the
> transactions handle failures appropriately.

I tried converting this and it wasn't as easy as I thought.

The downside is that now we have a function that sometimes consumes the
transaction and the ILOCK, and there's still the question of what to do
if xfs_inode_free_quota_blocks doesn't find any space to free.

By leaving all the ugliness in the call sites, we maintain the property
that the function that allocates the transaction and ilocks the inode
also gets to iunlock and free the tp, and we can also skip the retry if
the eofblocks flush doesn't clear anything.

It might be easier to rework this as a xfs_trans_alloc_quota function
wherein you feed it a tres, the inode, the number of blocks you want,
and whether or not this is an rt extent; and either it reserves and
locks everything for you, or returns failure.  The downside is that
doesn't work for reflink since it doesn't always require quota
reservation to remap an extent.

But it's pretty late in the day and my brain is scrambled eggs so I'll
defer until tomorrow.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] xfs: hide most of the incore inode walk interface
  2021-01-11 23:22 ` [PATCH 1/6] xfs: hide most of the incore inode walk interface Darrick J. Wong
@ 2021-01-13 14:34   ` Christoph Hellwig
  2021-01-18 19:50     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:22:40PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Hide the incore inode walk interface because callers outside of the
> icache code don't need to know about iter_flags and radix tags and other
> implementation details of the incore inode cache.

This looks correct, but I'm still rather lukewarm on all the extra code
just to hide a single flags argument.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions
  2021-01-11 23:22 ` [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
@ 2021-01-13 14:37   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:22:46PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The functions to run an eof/cowblocks scan to try to reduce quota usage
> are kind of a mess -- the logic repeatedly initializes an eofb structure
> and there are logic bugs in the code that result in the cowblocks scan
> never actually happening.
> 
> Replace all three functions with a single function that fills out an
> eofb if we're low on quota and runs both eof and cowblocks scans.
> 
> Fixes: 83104d449e8c4 ("xfs: garbage collect old cowextsz reservations")

This just cleans things up and doesn't actually fix any bug, does it?

> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5b0f93f73837..5d5cf25668b5 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -652,7 +652,7 @@ xfs_file_buffered_aio_write(
>  	struct inode		*inode = mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret;
> -	int			enospc = 0;
> +	bool			cleared_space = false;

The variable renaming here looks useful, but unrelated to the rest.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-11 23:22 ` [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
@ 2021-01-13 14:43   ` Christoph Hellwig
  2021-01-14 21:54     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:22:52PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Don't stall the cowblocks scan on a locked inode if we possibly can.
> We'd much rather the background scanner keep moving.

Wouldn't it make more sense to move the logic to ignore the -EAGAIN
for not-sync calls into xfs_inode_walk_ag?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota
  2021-01-11 23:22 ` [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
@ 2021-01-13 14:45   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:22:58PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Buffered writers who have run out of quota reservation call
> xfs_inode_free_quota_blocks to try to free any space reservations that
> might reduce the quota usage.  Unfortunately, the buffered write path
> treats "out of project quota" the same as "out of overall space" so this
> function has never supported scanning for space that might ease an "out
> of project quota" condition.
> 
> We're about to start using this function for cases where we actually
> /can/ tell if we're out of project quota, so add in this functionality.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-13 14:43   ` Christoph Hellwig
@ 2021-01-14 21:54     ` Darrick J. Wong
  2021-01-18 17:34       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-14 21:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 03:43:57PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 11, 2021 at 03:22:52PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > We'd much rather the background scanner keep moving.
> 
> Wouldn't it make more sense to move the logic to ignore the -EAGAIN
> for not-sync calls into xfs_inode_walk_ag?

I'm not sure what you're asking here?  _free_cowblocks only returns
EAGAIN for sync calls.  Locking failure for a not-sync call results in a
return 0, which means that _walk_ag just moves on to the next inode.

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota
  2021-01-12  2:18         ` Darrick J. Wong
@ 2021-01-14 22:10           ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-14 22:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 06:18:37PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 12, 2021 at 12:40:45PM +1100, Dave Chinner wrote:
> > On Mon, Jan 11, 2021 at 05:31:26PM -0800, Darrick J. Wong wrote:
> > > On Tue, Jan 12, 2021 at 12:22:49PM +1100, Dave Chinner wrote:
> > > > On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > If a fs modification (creation, file write, reflink, etc.) is unable to
> > > > > reserve enough quota to handle the modification, try clearing whatever
> > > > > space the filesystem might have been hanging onto in the hopes of
> > > > > speeding up the filesystem.  The flushing behavior will become
> > > > > particularly important when we add deferred inode inactivation because
> > > > > that will increase the amount of space that isn't actively tied to user
> > > > > data.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > ---
> > > > >  fs/xfs/xfs_bmap_util.c |   16 ++++++++++++++++
> > > > >  fs/xfs/xfs_file.c      |    2 +-
> > > > >  fs/xfs/xfs_icache.c    |    9 +++++++--
> > > > >  fs/xfs/xfs_icache.h    |    2 +-
> > > > >  fs/xfs/xfs_inode.c     |   17 +++++++++++++++++
> > > > >  fs/xfs/xfs_ioctl.c     |    2 ++
> > > > >  fs/xfs/xfs_iomap.c     |   20 +++++++++++++++++++-
> > > > >  fs/xfs/xfs_reflink.c   |   40 +++++++++++++++++++++++++++++++++++++---
> > > > >  fs/xfs/xfs_trace.c     |    1 +
> > > > >  fs/xfs/xfs_trace.h     |   40 ++++++++++++++++++++++++++++++++++++++++
> > > > >  10 files changed, 141 insertions(+), 8 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > > > index 7371a7f7c652..437fdc8a8fbd 100644
> > > > > --- a/fs/xfs/xfs_bmap_util.c
> > > > > +++ b/fs/xfs/xfs_bmap_util.c
> > > > > @@ -761,6 +761,7 @@ xfs_alloc_file_space(
> > > > >  	 */
> > > > >  	while (allocatesize_fsb && !error) {
> > > > >  		xfs_fileoff_t	s, e;
> > > > > +		bool		cleared_space = false;
> > > > >  
> > > > >  		/*
> > > > >  		 * Determine space reservations for data/realtime.
> > > > > @@ -803,6 +804,7 @@ xfs_alloc_file_space(
> > > > >  		/*
> > > > >  		 * Allocate and setup the transaction.
> > > > >  		 */
> > > > > +retry:
> > > > >  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
> > > > >  				resrtextents, 0, &tp);
> > > > >  
> > > > > @@ -819,6 +821,20 @@ xfs_alloc_file_space(
> > > > >  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > > >  		error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks,
> > > > >  						      0, quota_flag);
> > > > > +		/*
> > > > > +		 * We weren't able to reserve enough quota to handle fallocate.
> > > > > +		 * Flush any disk space that was being held in the hopes of
> > > > > +		 * speeding up the filesystem.  We hold the IOLOCK so we cannot
> > > > > +		 * do a synchronous scan.
> > > > > +		 */
> > > > > +		if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) {
> > > > > +			xfs_trans_cancel(tp);
> > > > > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > > > +			cleared_space = xfs_inode_free_quota_blocks(ip, false);
> > > > > +			if (cleared_space)
> > > > > +				goto retry;
> > > > > +			return error;
> > > > 
> > > > Can't say I'm a fan of repeating this everywhere.  Can we move this
> > > > into xfs_trans_reserve_quota_nblks() with a "retry" flag such that
> > > > we do:
> > > > 
> > > > 	error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0,
> > > > 					quota_flag, &retry);
> > > > 	if (error) {
> > > > 		/* tp already cancelled, inode unlocked */
> > > > 		return error;
> > > > 	}
> > > > 	if (retry) {
> > > > 		/* tp already cancelled, inode unlocked */
> > > > 		goto retry;
> > > > 	}
> > > 
> > > Assuming you'd be interested in the same kind of change being applied to
> > > the next patch (kick the inode reclaim from xfs_trans_reserve on ENOSPC)
> > > then yes, that seems like a nice cleanup.
> > 
> > *nod*
> > 
> > It definitely seems to me to be cleaner to put all this GC stuff
> > into the transaction setup code that does the actual space
> > reservation, and then simply have the code that is setting up the
> > transactions handle failures appropriately.
> 
> I tried converting this and it wasn't as easy as I thought.
> 
> The downside is that now we have a function that sometimes consumes the
> transaction and the ILOCK, and there's still the question of what to do
> if xfs_inode_free_quota_blocks doesn't find any space to free.
> 
> By leaving all the ugliness in the call sites, we maintain the property
> that the function that allocates the transaction and ilocks the inode
> also gets to iunlock and free the tp, and we can also skip the retry if
> the eofblocks flush doesn't clear anything.
> 
> It might be easier to rework this as a xfs_trans_alloc_quota function
> wherein you feed it a tres, the inode, the number of blocks you want,
> and whether or not this is an rt extent; and either it reserves and
> locks everything for you, or returns failure.  The downside is that
> doesn't work for reflink since it doesn't always require quota
> reservation to remap an extent.
> 
> But it's pretty late in the day and my brain is scrambled eggs so I'll
> defer until tomorrow.

...until three days later.  You were right, it wasn't so difficult to
make xfs_trans_reserve_quota_nblks cancel the transaction on all errors,
and run the scan if it hit edquot/enospc.

--D

> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-14 21:54     ` Darrick J. Wong
@ 2021-01-18 17:34       ` Christoph Hellwig
  2021-01-18 19:37         ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-18 17:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Jan 14, 2021 at 01:54:53PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 13, 2021 at 03:43:57PM +0100, Christoph Hellwig wrote:
> > On Mon, Jan 11, 2021 at 03:22:52PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > We'd much rather the background scanner keep moving.
> > 
> > Wouldn't it make more sense to move the logic to ignore the -EAGAIN
> > for not-sync calls into xfs_inode_walk_ag?
> 
> I'm not sure what you're asking here?  _free_cowblocks only returns
> EAGAIN for sync calls.  Locking failure for a not-sync call results in a
> return 0, which means that _walk_ag just moves on to the next inode.

What I mean is:

 - always return -EAGAIN when taking the locks fails
 - don't exit early on -EAGAIN in xfs_inode_walk at least for sync
   calls, although thinking loud I see no good reason to exit early
   even for non-sync invocations

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-18 17:34       ` Christoph Hellwig
@ 2021-01-18 19:37         ` Darrick J. Wong
  2021-01-18 19:39           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-18 19:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Jan 18, 2021 at 05:34:12PM +0000, Christoph Hellwig wrote:
> On Thu, Jan 14, 2021 at 01:54:53PM -0800, Darrick J. Wong wrote:
> > On Wed, Jan 13, 2021 at 03:43:57PM +0100, Christoph Hellwig wrote:
> > > On Mon, Jan 11, 2021 at 03:22:52PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Don't stall the cowblocks scan on a locked inode if we possibly can.
> > > > We'd much rather the background scanner keep moving.
> > > 
> > > Wouldn't it make more sense to move the logic to ignore the -EAGAIN
> > > for not-sync calls into xfs_inode_walk_ag?
> > 
> > I'm not sure what you're asking here?  _free_cowblocks only returns
> > EAGAIN for sync calls.  Locking failure for a not-sync call results in a
> > return 0, which means that _walk_ag just moves on to the next inode.
> 
> What I mean is:
> 
>  - always return -EAGAIN when taking the locks fails
>  - don't exit early on -EAGAIN in xfs_inode_walk at least for sync
>    calls, although thinking loud I see no good reason to exit early
>    even for non-sync invocations

Ah, I see, you're asking why don't I make xfs_inode_walk responsible for
deciding what to do about EAGAIN, instead of open-coding that in the
->execute function.  That would be a nice cleanup since the walk
function already has special casing for EFSCORRUPTED.

If I read you correctly, the relevant part of xfs_inode_walk becomes:

	error = execute(batch[i]...);
	xfs_irele(batch[i]);
	if (error == -EAGAIN) {
		if (args->flags & EOF_SYNC)
			skipped++;
		continue;
	}

and the relevant part of xfs_inode_free_eofblocks becomes:

	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
		return -EAGAIN;

I think that would work, and afaict it won't cause any serious problems
with the deferred inactivation series.

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-18 19:37         ` Darrick J. Wong
@ 2021-01-18 19:39           ` Christoph Hellwig
  2021-01-18 19:44             ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-18 19:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Jan 18, 2021 at 11:37:18AM -0800, Darrick J. Wong wrote:
> Ah, I see, you're asking why don't I make xfs_inode_walk responsible for
> deciding what to do about EAGAIN, instead of open-coding that in the
> ->execute function.  That would be a nice cleanup since the walk
> function already has special casing for EFSCORRUPTED.
> 
> If I read you correctly, the relevant part of xfs_inode_walk becomes:
> 
> 	error = execute(batch[i]...);
> 	xfs_irele(batch[i]);
> 	if (error == -EAGAIN) {
> 		if (args->flags & EOF_SYNC)
> 			skipped++;
> 		continue;
> 	}
> 
> and the relevant part of xfs_inode_free_eofblocks becomes:
> 
> 	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
> 		return -EAGAIN;
> 
> I think that would work, and afaict it won't cause any serious problems
> with the deferred inactivation series.

Exactly!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-18 19:39           ` Christoph Hellwig
@ 2021-01-18 19:44             ` Darrick J. Wong
  2021-01-18 19:51               ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-18 19:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Jan 18, 2021 at 07:39:58PM +0000, Christoph Hellwig wrote:
> On Mon, Jan 18, 2021 at 11:37:18AM -0800, Darrick J. Wong wrote:
> > Ah, I see, you're asking why don't I make xfs_inode_walk responsible for
> > deciding what to do about EAGAIN, instead of open-coding that in the
> > ->execute function.  That would be a nice cleanup since the walk
> > function already has special casing for EFSCORRUPTED.
> > 
> > If I read you correctly, the relevant part of xfs_inode_walk becomes:
> > 
> > 	error = execute(batch[i]...);
> > 	xfs_irele(batch[i]);
> > 	if (error == -EAGAIN) {
> > 		if (args->flags & EOF_SYNC)
> > 			skipped++;
> > 		continue;
> > 	}
> > 
> > and the relevant part of xfs_inode_free_eofblocks becomes:
> > 
> > 	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
> > 		return -EAGAIN;
> > 
> > I think that would work, and afaict it won't cause any serious problems
> > with the deferred inactivation series.
> 
> Exactly!

D'oh.  I tried making that change, but ran into the problem that *args
isn't necessarily an eofb structure, and xfs_qm_dqrele_all_inodes passes
a uint pointer.  I could define a new XFS_INODE_WALK_SYNC flag and
update the blockgc callers to set that if EOF_FLAGS_SYNC is set...

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] xfs: hide most of the incore inode walk interface
  2021-01-13 14:34   ` Christoph Hellwig
@ 2021-01-18 19:50     ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2021-01-18 19:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 03:34:18PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 11, 2021 at 03:22:40PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Hide the incore inode walk interface because callers outside of the
> > icache code don't need to know about iter_flags and radix tags and other
> > implementation details of the incore inode cache.
> 
> This looks correct, but I'm still rather lukewarm on all the extra code
> just to hide a single flags argument.

Yeah, I looked at the extra patches that are now in this series to deal
with the cleanups mentioned in other review comments and decided to drop
this.

--D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks
  2021-01-18 19:44             ` Darrick J. Wong
@ 2021-01-18 19:51               ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-18 19:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Jan 18, 2021 at 11:44:22AM -0800, Darrick J. Wong wrote:
> D'oh.  I tried making that change, but ran into the problem that *args
> isn't necessarily an eofb structure, and xfs_qm_dqrele_all_inodes passes
> a uint pointer.  I could define a new XFS_INODE_WALK_SYNC flag and
> update the blockgc callers to set that if EOF_FLAGS_SYNC is set...

That does actually sound cleaner to me, but I don't want to burden that
work on you.  Feel free to stick to the current version and we can
clean this up later.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-01-18 19:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 23:22 [PATCHSET v2 0/6] xfs: try harder to reclaim space when we run out Darrick J. Wong
2021-01-11 23:22 ` [PATCH 1/6] xfs: hide most of the incore inode walk interface Darrick J. Wong
2021-01-13 14:34   ` Christoph Hellwig
2021-01-18 19:50     ` Darrick J. Wong
2021-01-11 23:22 ` [PATCH 2/6] xfs: refactor messy xfs_inode_free_quota_* functions Darrick J. Wong
2021-01-13 14:37   ` Christoph Hellwig
2021-01-11 23:22 ` [PATCH 3/6] xfs: don't stall cowblocks scan if we can't take locks Darrick J. Wong
2021-01-13 14:43   ` Christoph Hellwig
2021-01-14 21:54     ` Darrick J. Wong
2021-01-18 17:34       ` Christoph Hellwig
2021-01-18 19:37         ` Darrick J. Wong
2021-01-18 19:39           ` Christoph Hellwig
2021-01-18 19:44             ` Darrick J. Wong
2021-01-18 19:51               ` Christoph Hellwig
2021-01-11 23:22 ` [PATCH 4/6] xfs: xfs_inode_free_quota_blocks should scan project quota Darrick J. Wong
2021-01-13 14:45   ` Christoph Hellwig
2021-01-11 23:23 ` [PATCH 5/6] xfs: flush speculative space allocations when we run out of quota Darrick J. Wong
2021-01-12  1:22   ` Dave Chinner
2021-01-12  1:31     ` Darrick J. Wong
2021-01-12  1:40       ` Dave Chinner
2021-01-12  2:18         ` Darrick J. Wong
2021-01-14 22:10           ` Darrick J. Wong
2021-01-11 23:23 ` [PATCH 6/6] xfs: flush speculative space allocations when we run out of space Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).