All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v6 0/3] XFS realtime device tweaks
@ 2017-11-22 22:40 Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

Re-sending patch; re-based to 4.14-rc8 (& re-tested).  Patch 1 in this series
is reviewed and is ready to be merged independent of the others.

====

1. Inode flag now correctly set when locks are held via XFS_BMAPI_RTDATA
   flag.
2. Realtime flag is honored when set by user via ioctl or inherit flag on
   directory.
3. Misc changes around formatting & bounds checks on sysfs options.

See individual patches for more details.

Please pay close attention to the change in xfs_file_iomap_begin (patch 2),
the new version of the patch by-passes the xfs_file_iomap_begin_delay function
in the "realtime" case, since the realtime code here is not reachable/dead
(see assert in this function).  Instead, we by-pass this, hit
xfs_iomap_write_direct where the XFS_BMAPI_RTDATA will be passed on to the
xfs_bmapi_write function where it's set.

I'm curious if there is a better approach, and/or verification this is
sane/safe.

Patch set based off Linux 4.14-rc8 (commit
39dae59d66acd86d1de24294bd2f343fd5e7a625) located @
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git .


Richard Wareing (3):
  xfs: Show realtime device stats on statfs calls if inherit flag set
  xfs: Set realtime flag based on initial allocation size
  xfs: Add realtime fallback if data device full

 Documentation/filesystems/xfs.txt | 27 +++++++++++-
 fs/xfs/libxfs/xfs_bmap.c          | 35 +++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h          |  3 ++
 fs/xfs/xfs_bmap_util.c            |  3 ++
 fs/xfs/xfs_fsops.c                |  2 +
 fs/xfs/xfs_inode.c                |  6 +++
 fs/xfs/xfs_iomap.c                | 18 +++++++-
 fs/xfs/xfs_linux.h                |  2 +
 fs/xfs/xfs_mount.c                | 24 +++++++++++
 fs/xfs/xfs_mount.h                |  8 ++++
 fs/xfs/xfs_rtalloc.c              | 90 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h              |  2 +
 fs/xfs/xfs_super.c                |  8 ++++
 fs/xfs/xfs_sysfs.c                | 80 ++++++++++++++++++++++++++++++++++
 14 files changed, 305 insertions(+), 3 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2017-11-28 21:20   ` Darrick J. Wong
  2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 1 reply; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- Reports realtime device free blocks in statfs calls if inheritance
  bit is set on the inode of directory.  This is a bit more intuitive,
  especially for use-cases which are using a much larger device for
  the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
  realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
  option.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* Fixed accounting bug, we are not required to substract m_alloc_set_aside
  as this is a data device only requirement.
* Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
  now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
  for the inode.

Changes since v2:
* Style updated per Christoph Hellwig's comment
* Fixed bug: statp->f_bavail = statp->f_bfree


 fs/xfs/xfs_linux.h | 2 ++
 fs/xfs/xfs_super.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index dcd1292..944b02d 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
 #define XFS_IS_REALTIME_INODE(ip)			\
 	(((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) &&	\
 	 (ip)->i_mount->m_rtdev_targp)
+#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
 #else
 #define XFS_IS_REALTIME_INODE(ip) (0)
+#define XFS_IS_REALTIME_MOUNT(mp) (0)
 #endif
 
 #endif /* __XFS_LINUX__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f663022..3c9a989 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1153,6 +1153,14 @@ xfs_fs_statfs(
 	    ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
 			      (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
 		xfs_qm_statvfs(ip, statp);
+
+	if (XFS_IS_REALTIME_MOUNT(mp) &&
+	    (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
+		statp->f_blocks = sbp->sb_rblocks;
+		statp->f_bavail = statp->f_bfree =
+			sbp->sb_frextents * sbp->sb_rextsize;
+	}
+
 	return 0;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- The rt_alloc_min sysfs option automatically selects the device (data
  device, or realtime) based on the size of the initial allocation of the
  file.
- This option can be used to route the storage of small files (and the
  inefficient workloads associated with them) to a suitable storage
  device such a SSD, while larger allocations are sent to a traditional
  HDD.
- Supports writes via O_DIRECT, buffered (i.e. page cache), and
  pre-allocations (i.e. fallocate)
- Available only when kernel is compiled w/ CONFIG_XFS_RT option.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* xfs_inode_select_target renamed to xfs_inode_select_rt_target and returns
  boolean to indicate if realtime device target is desired.
* Introduction of XFS_BMAPI_RTDATA which provides signal to the
  xfs_bmapi_allocate function the realtime flag must be set on the inode & the
  inode logged.
* Manual setting of the realtime flag by ioctl or directory rt inherit flag
  now takes precedence over the policy.
* Documentation

Changes since v4:
* Added xfs_inode_select_target function to hold target selection
  code
* XFS_IS_REALTIME_MOUNT check now moved inside xfs_inode_select_target
  function for better gating
* Improved consistency in the sysfs set behavior
* Style fixes

Changes since v3:
* Now functions via initial allocation regardless of O_DIRECT, buffered or
  pre-allocation code paths.  Provides a consistent user-experience.
* I Did do some experiments putting this in the xfs_bmapi_write code path
  however pre-allocation accounting unfortunately prevents this cleaner
  approach.  As such, this proved to be the cleanest and functional approach.
* No longer a mount option, now a sysfs tunable

 Documentation/filesystems/xfs.txt | 21 +++++++++++++++-
 fs/xfs/libxfs/xfs_bmap.c          | 35 +++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h          |  3 +++
 fs/xfs/xfs_bmap_util.c            |  3 +++
 fs/xfs/xfs_inode.c                |  6 +++++
 fs/xfs/xfs_iomap.c                | 18 ++++++++++++--
 fs/xfs/xfs_mount.h                |  1 +
 fs/xfs/xfs_rtalloc.c              | 50 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h              |  2 ++
 fs/xfs/xfs_sysfs.c                | 42 ++++++++++++++++++++++++++++++++
 10 files changed, 178 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3b9b5c1..0763972 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -94,7 +94,7 @@ default behaviour.
 	When inode64 is specified, it indicates that XFS is allowed
 	to create inodes at any location in the filesystem,
 	including those which will result in inode numbers occupying
-	more than 32 bits of significance. 
+	more than 32 bits of significance.
 
 	inode32 is provided for backwards compatibility with older
 	systems and applications, since 64 bits inode numbers might
@@ -467,3 +467,22 @@ the class and error context. For example, the default values for
 "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
 to "fail immediately" behaviour. This is done because ENODEV is a fatal,
 unrecoverable error no matter how many times the metadata IO is retried.
+
+Realtime Device Sysfs Options
+=============================
+
+When using a realtime sub-volume, the following sysfs options are supported:
+
+  /sys/fs/xfs/<dev>/rt_alloc_min
+  (Units: bytes  Min: 0  Default: 0  Max: INT_MAX)
+	When set, the file will be allocated blocks from the realtime device if the
+	initial allocation request size (in bytes) is equal to or above this value.
+	For XFS use-cases where appends are unlikely or not supported, this option
+	can be used to place smaller files on a the data device (typically an SSD),
+	while larger files are placed on the realtime device (typically an HDD).
+
+	Any files which have the realtime flag set by an ioctl call or realtime
+	inheritance flag on the directory will not be affected by this option.
+	Buffered, direct IO and pre-allocation are supported.
+
+	Setting the value to "0" disables this behavior.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8926379..dd02a52 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4188,6 +4188,39 @@ xfs_bmapi_reserve_delalloc(
 	return error;
 }
 
+/*
+ * This function will set the XFS_DIFLAG_REALTIME flag on the inode if
+ * the XFS_BMAPI_RTDATA flag is set on the xfs_bmalloca struct.
+ *
+ * This function is only valid for realtime mounts, and only on the initial
+ * allocation for the file.
+ *
+ */
+void
+xfs_bmapi_rt_data_flag(
+	struct xfs_mount	*mp,
+	struct xfs_bmalloca	*bma)
+{
+
+	/* Only valid if this is a realtime mount */
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return;
+
+	/* Only valid if file is empty */
+	if (!(bma->datatype & XFS_ALLOC_INITIAL_USER_DATA))
+		return;
+
+	/* Nothing to do, realtime flag already set */
+	if (bma->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		return;
+
+	/* Set realtime flag and log it if RTDATA flag is set */
+	if (bma->flags & XFS_BMAPI_RTDATA) {
+		bma->ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+		bma->logflags |= XFS_ILOG_CORE;
+	}
+}
+
 static int
 xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
@@ -4238,6 +4271,8 @@ xfs_bmapi_allocate(
 
 	bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
 
+	xfs_bmapi_rt_data_flag(mp, bma);
+
 	/*
 	 * Only want to do the alignment at the eof if it is userdata and
 	 * allocation length is larger than a stripe unit.
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 502e0d8..6f67588 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,9 @@ struct xfs_extent_free_item
 /* Only convert delalloc space, don't allocate entirely new extents */
 #define XFS_BMAPI_DELALLOC	0x400
 
+/* Allocate to realtime device */
+#define XFS_BMAPI_RTDATA	0x800
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6503cfa..b04363b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1053,6 +1053,9 @@ xfs_alloc_file_space(
 		return -EINVAL;
 
 	rt = XFS_IS_REALTIME_INODE(ip);
+	if (!rt && (rt = xfs_inode_select_rt_target(ip, len)))
+		alloc_type |= XFS_BMAPI_RTDATA;
+
 	extsz = xfs_get_extsz_hint(ip);
 
 	count = len;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4ec5b7f..ed29549 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1633,6 +1633,12 @@ xfs_itruncate_extents(
 		xfs_inode_clear_cowblocks_tag(ip);
 	}
 
+	if (ip->i_d.di_nblocks == 0 && XFS_IS_REALTIME_MOUNT(mp) &&
+	    mp->m_rt_alloc_min) {
+		/* Clear realtime flag if m_rt_alloc_min policy is in place */
+		ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+	}
+
 	/*
 	 * Always re-log the inode so that our permanent transaction can keep
 	 * on rolling it forward in the log.
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f179bdf..518a9bb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -40,6 +40,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
+#include "xfs_rtalloc.h"
 
 
 #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
@@ -175,7 +176,11 @@ xfs_iomap_write_direct(
 	int		bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint		tflags = 0;
 
+
 	rt = XFS_IS_REALTIME_INODE(ip);
+	if (!rt && (rt = xfs_inode_select_rt_target(ip, count)))
+		bmapi_flags |= XFS_BMAPI_RTDATA;
+
 	extsz = xfs_get_extsz_hint(ip);
 	lockmode = XFS_ILOCK_SHARED;	/* locked by caller */
 
@@ -985,8 +990,17 @@ xfs_file_iomap_begin(
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
-		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
+		/*
+		 * For non-odirect writes, check if this will be allocated to
+		 * realtime, if so we by-pass xfs_file_iomap_begin_delay as if
+		 * the inode was already marked realtime (see xfs_get_extsz_hint).
+		 * The actual setting of the realtime flag on the inode will be
+		 * done later on.
+		 */
+		if (!xfs_inode_select_rt_target(ip, XFS_FSB_TO_B(mp, length)))
+			/* Reserve delalloc blocks for regular writeback. */
+			return xfs_file_iomap_begin_delay(inode, offset, length,
+					iomap);
 	}
 
 	if (need_excl_ilock(ip, flags)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d0..0db9731 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,6 +197,7 @@ typedef struct xfs_mount {
 	uint32_t		m_generation;
 
 	bool			m_fail_unmount;
+	xfs_off_t		m_rt_alloc_min; /* Min RT allocation */
 #ifdef DEBUG
 	/*
 	 * Frequency with which errors are injected.  Replaces xfs_etest; the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 488719d..145007b 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,3 +1284,53 @@ xfs_rtpick_extent(
 	*pick = b;
 	return 0;
 }
+
+/*
+ * If allocation length is less than rt_alloc_min threshold select the
+ * data device.   Otherwise, select the realtime device.
+ */
+bool
+xfs_rt_alloc_min(
+	struct xfs_mount	*mp,
+	xfs_off_t		len)
+{
+	if (!mp->m_rt_alloc_min)
+		return false;
+
+	if (len >= mp->m_rt_alloc_min)
+		return true;
+
+	return false;
+}
+
+/*
+* Select the target device for the inode based on either the size of the
+* initial allocation, or the amount of space available on the data device.
+*
+*/
+bool
+xfs_inode_select_rt_target(
+	struct xfs_inode	*ip,
+	xfs_off_t		len)
+{
+	struct xfs_mount    *mp = ip->i_mount;
+
+	/* If the mount does not have a realtime device configured, there's
+	 * nothing to do here.
+	 */
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return false;
+
+	/* You cannot select a new device target once blocks have been allocated
+	 * (e.g. fallocate() beyond EOF), or if data has been written already.
+	 */
+	if (ip->i_d.di_nextents)
+		return false;
+	if (ip->i_d.di_size)
+		return false;
+
+	/* Select realtime device as our target based on the value of
+	 * mp->m_rt_alloc_min.  Target selection code if not valid if not set.
+	 */
+	return xfs_rt_alloc_min(mp, len);
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 79defa7..4f058b5 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -138,6 +138,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
 int xfs_rtalloc_query_all(struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
 			  void *priv);
+bool xfs_inode_select_rt_target(struct xfs_inode *ip, xfs_off_t len);
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
 # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
@@ -158,6 +159,7 @@ xfs_rtmount_init(
 }
 # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
 # define xfs_rtunmount_inodes(m)
+# define xfs_inode_select_rt_target(i,l)		(0)
 #endif	/* CONFIG_XFS_RT */
 
 #endif	/* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b2ccc2..8b425be 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -90,7 +90,49 @@ to_mp(struct kobject *kobject)
 	return container_of(kobj, struct xfs_mount, m_kobj);
 }
 
+#ifdef CONFIG_XFS_RT
+STATIC ssize_t
+rt_alloc_min_store(
+	struct kobject		*kobject,
+	const char		*buf,
+	size_t			count)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+	int			ret;
+	int			val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	/* Only valid if using a real-time device */
+	if(!XFS_IS_REALTIME_MOUNT(mp))
+		return -EINVAL;
+
+	if (val >= 0)
+		mp->m_rt_alloc_min = val;
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+STATIC ssize_t
+rt_alloc_min_show(
+	struct kobject		*kobject,
+	char			*buf)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
+}
+XFS_SYSFS_ATTR_RW(rt_alloc_min);
+#endif
+
 static struct attribute *xfs_mp_attrs[] = {
+#ifdef CONFIG_XFS_RT
+	ATTR_LIST(rt_alloc_min),
+#endif
 	NULL,
 };
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- For FSes which have a realtime device configured, rt_fallback_pct forces
  allocations to the realtime device after data device usage reaches
  rt_fallback_pct.
- Useful for realtime device users to help prevent ENOSPC errors when
  selectively storing some files (e.g. small files) on data device, while
  others are stored on realtime block device.
- Set via the "rt_fallback_pct" sysfs value which is available if
  the kernel is compiled with CONFIG_XFS_RT.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* Minor change to work with XFS_BMAPI_RTDATA method described
  in rt_alloc_min patch
* Fixed bounds checks on sysfs option
* Documentation

Changes since v4:
* Refactored to align with xfs_inode_select_target change
* Fallback percentage reworked to trigger on % space used on data device.
  I find this a bit more intuitive as it aligns well with "df" output.
* mp->m_rt_min_fdblocks now assigned via function call
* Better consistency on sysfs options

Changes since v3:
* None, new patch to patch set

 Documentation/filesystems/xfs.txt |  6 ++++++
 fs/xfs/xfs_fsops.c                |  2 ++
 fs/xfs/xfs_mount.c                | 24 ++++++++++++++++++++++
 fs/xfs/xfs_mount.h                |  7 +++++++
 fs/xfs/xfs_rtalloc.c              | 42 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_sysfs.c                | 38 +++++++++++++++++++++++++++++++++++
 6 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 0763972..ed6f6e2 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -486,3 +486,9 @@ When using a realtime sub-volume, the following sysfs options are supported:
 	Buffered, direct IO and pre-allocation are supported.
 
 	Setting the value to "0" disables this behavior.
+
+  /sys/fs/xfs/<dev>/rt_fallback_pct
+  (Units: percentage  Min: 0  Default: 0,  Max: 100)
+	When set, the file will be allocated blocks from the realtime device if the
+	data device space utilization rises above rt_fallback_pct.  Setting the
+	value to "0" disables this behavior.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc5..89713f1 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -610,6 +610,8 @@ xfs_growfs_data_private(
 	xfs_set_low_space_thresholds(mp);
 	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
+	mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+
 	/*
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e9727d0..3905e57 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1396,3 +1396,27 @@ xfs_dev_is_read_only(
 	}
 	return 0;
 }
+
+/*
+ * precalculate minimum of data blocks required, if we fall
+ * below this value, we will fallback to the real-time device.
+ *
+ * m_rt_fallback_pct can only be non-zero if a real-time device
+ * is configured.
+ */
+uint64_t
+xfs_rt_calc_min_free_dblocks(
+	struct xfs_mount	*mp)
+{
+	xfs_rfsblock_t		min_free_dblocks = 0;
+
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return 0;
+
+	/* Pre-compute minimum data blocks required before
+	 * falling back to RT device for allocations
+	 */
+	min_free_dblocks = mp->m_sb.sb_dblocks * (100 - mp->m_rt_fallback_pct);
+	do_div(min_free_dblocks, 100);
+	return min_free_dblocks;
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0db9731..9dc17b8 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -198,6 +198,12 @@ typedef struct xfs_mount {
 
 	bool			m_fail_unmount;
 	xfs_off_t		m_rt_alloc_min; /* Min RT allocation */
+	/* Fallback to realtime device if data device usage above rt_fallback_pct */
+	uint			m_rt_fallback_pct;
+	/* Use realtime device if free data device blocks falls below this; computed
+	 * from m_rt_fallback_pct.
+	 */
+	xfs_rfsblock_t		m_rt_min_free_dblocks;
 #ifdef DEBUG
 	/*
 	 * Frequency with which errors are injected.  Replaces xfs_etest; the
@@ -447,4 +453,5 @@ int	xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 		int error_class, int error);
 
+uint64_t	xfs_rt_calc_min_free_dblocks(struct xfs_mount *mp);
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 145007b..3abd403 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1304,6 +1304,37 @@ xfs_rt_alloc_min(
 }
 
 /*
+ * m_rt_min_free_dblocks is a pre-computed threshold, which controls target
+ * selection based on how many free blocks are available on the data device.
+ *
+ * If the number of free data device blocks falls below
+ * mp->m_rt_min_free_dblocks, the realtime device is selected as the target
+ * device.  If this value is not set, this target policy is in-active.
+ *
+ */
+bool
+xfs_rt_min_free_dblocks(
+	struct xfs_mount	*mp,
+	struct xfs_inode	*ip,
+	xfs_off_t		len)
+{
+	/* Disabled */
+	if (!mp->m_rt_fallback_pct)
+		return false;
+
+	/* If inode target is already realtime device, nothing to do here */
+	if (!XFS_IS_REALTIME_INODE(ip)) {
+		uint64_t	free_dblocks;
+		free_dblocks = percpu_counter_sum(&mp->m_fdblocks) -
+			mp->m_alloc_set_aside;
+		if (free_dblocks < mp->m_rt_min_free_dblocks) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/*
 * Select the target device for the inode based on either the size of the
 * initial allocation, or the amount of space available on the data device.
 *
@@ -1332,5 +1363,14 @@ xfs_inode_select_rt_target(
 	/* Select realtime device as our target based on the value of
 	 * mp->m_rt_alloc_min.  Target selection code if not valid if not set.
 	 */
-	return xfs_rt_alloc_min(mp, len);
+	if (xfs_rt_alloc_min(mp, len))
+		return true;
+
+	/* Check if data device has enough space, if not fallback to realtime
+	 * device.  Valid only if mp->m_rt_fallback_pct is set.
+	 */
+	if (xfs_rt_min_free_dblocks(mp, ip, len))
+		return true;
+
+	return false;
 }
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b425be..64f29b6 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -127,11 +127,49 @@ rt_alloc_min_show(
 	return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
 }
 XFS_SYSFS_ATTR_RW(rt_alloc_min);
+
+STATIC ssize_t
+rt_fallback_pct_store(
+	struct kobject		*kobject,
+	const char		*buf,
+	size_t			count)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+	int			ret;
+	int			val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return -EINVAL;
+
+	if (val < 0 || val > 100)
+		return -EINVAL;
+
+	/* Only valid if using a real-time device */
+	mp->m_rt_fallback_pct = val;
+	mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+	return count;
+}
+
+STATIC ssize_t
+rt_fallback_pct_show(
+	struct kobject          *kobject,
+	char                    *buf)
+{
+	struct xfs_mount        *mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_fallback_pct);
+}
+XFS_SYSFS_ATTR_RW(rt_fallback_pct);
 #endif
 
 static struct attribute *xfs_mp_attrs[] = {
 #ifdef CONFIG_XFS_RT
 	ATTR_LIST(rt_alloc_min),
+	ATTR_LIST(rt_fallback_pct),
 #endif
 	NULL,
 };
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-28 21:20   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-28 21:20 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, david, hch

On Wed, Nov 22, 2017 at 02:40:07PM -0800, Richard Wareing wrote:
> - Reports realtime device free blocks in statfs calls if inheritance
>   bit is set on the inode of directory.  This is a bit more intuitive,
>   especially for use-cases which are using a much larger device for
>   the realtime device.
> - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
>   realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
>   option.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v5:
> * None
> 
> Changes since v4:
> * None
> 
> Changes since v3:
> * Fixed accounting bug, we are not required to substract m_alloc_set_aside
>   as this is a data device only requirement.
> * Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
>   now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
>   for the inode.
> 
> Changes since v2:
> * Style updated per Christoph Hellwig's comment
> * Fixed bug: statp->f_bavail = statp->f_bfree
> 
> 
>  fs/xfs/xfs_linux.h | 2 ++
>  fs/xfs/xfs_super.c | 8 ++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index dcd1292..944b02d 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
>  #define XFS_IS_REALTIME_INODE(ip)			\
>  	(((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) &&	\
>  	 (ip)->i_mount->m_rtdev_targp)
> +#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
>  #else
>  #define XFS_IS_REALTIME_INODE(ip) (0)
> +#define XFS_IS_REALTIME_MOUNT(mp) (0)
>  #endif
>  
>  #endif /* __XFS_LINUX__ */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f663022..3c9a989 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1153,6 +1153,14 @@ xfs_fs_statfs(
>  	    ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
>  			      (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
>  		xfs_qm_statvfs(ip, statp);
> +
> +	if (XFS_IS_REALTIME_MOUNT(mp) &&
> +	    (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {

For everyone else following at home: I asked on IRC, shouldn't we report
rtdev stats for any file that has REALTIME, but not RTINHERIT, set?

--D

> +		statp->f_blocks = sbp->sb_rblocks;
> +		statp->f_bavail = statp->f_bfree =
> +			sbp->sb_frextents * sbp->sb_rextsize;
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-28 21:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-11-28 21:20   ` Darrick J. Wong
2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.