All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] XFS realtime device tweaks
@ 2017-09-19  3:52 Richard Wareing
  2017-09-19  3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Richard Wareing @ 2017-09-19  3:52 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

Re-worked patch-set based on prior feedback & help from Dave Chinner.  See 
change log in each patch for change details.

All code (and setting of sysfs options) in these patches should be safely 
gated behind XFS_IS_REALTIME_* macros. (Please point out any such cases which 
are not!)

Richard Wareing (3):
  xfs: Show realtime device stats on statfs calls if inherit flag set
  xfs: Set realtime flag based on initial allocation size
  xfs: Add realtime fallback if data device full

 fs/xfs/xfs_bmap_util.c |  5 ++++
 fs/xfs/xfs_fsops.c     |  4 +++
 fs/xfs/xfs_inode.c     | 18 ++++++++----
 fs/xfs/xfs_iomap.c     | 12 ++++++++
 fs/xfs/xfs_linux.h     |  2 ++
 fs/xfs/xfs_mount.c     | 27 ++++++++++++++++-
 fs/xfs/xfs_mount.h     |  6 ++++
 fs/xfs/xfs_rtalloc.c   | 40 +++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h   |  5 ++++
 fs/xfs/xfs_super.c     |  8 +++++
 fs/xfs/xfs_sysfs.c     | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 11 files changed, 200 insertions(+), 7 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-09-19  3:52 [PATCH v4 0/3] XFS realtime device tweaks Richard Wareing
@ 2017-09-19  3:52 ` Richard Wareing
  2017-09-22  5:35   ` Dave Chinner
  2017-09-19  3:52 ` [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
  2017-09-19  3:52 ` [PATCH v4 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Wareing @ 2017-09-19  3:52 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- Reports realtime device free blocks in statfs calls if inheritance
  bit is set on the inode of directory.  This is a bit more intuitive,
  especially for use-cases which are using a much larger device for
  the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
  realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
  option.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v3:
* Fixed accounting bug, we are not required to substract m_alloc_set_aside
  as this is a data device only requirement.
* Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
  now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
  for the inode.

Changes since v2:
* Style updated per Christoph Hellwig's comment
* Fixed bug: statp->f_bavail = statp->f_bfree

 fs/xfs/xfs_linux.h | 2 ++
 fs/xfs/xfs_super.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 044fb0e..fe46e71 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -280,8 +280,10 @@ static inline __uint64_t howmany_64(__uint64_t x, __uint32_t y)
 
 #ifdef CONFIG_XFS_RT
 #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME)
+#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
 #else
 #define XFS_IS_REALTIME_INODE(ip) (0)
+#define XFS_IS_REALTIME_MOUNT(mp) (0)
 #endif
 
 #endif /* __XFS_LINUX__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 455a575..6d33a5e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1136,6 +1136,14 @@ xfs_fs_statfs(
 	    ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
 			      (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
 		xfs_qm_statvfs(ip, statp);
+
+	if (XFS_IS_REALTIME_MOUNT(mp) &&
+	    (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
+		statp->f_blocks = sbp->sb_rblocks;
+		statp->f_bavail = statp->f_bfree =
+			sbp->sb_frextents * sbp->sb_rextsize;
+	}
+
 	return 0;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size
  2017-09-19  3:52 [PATCH v4 0/3] XFS realtime device tweaks Richard Wareing
  2017-09-19  3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-09-19  3:52 ` Richard Wareing
  2017-09-22  5:54   ` Dave Chinner
  2017-09-19  3:52 ` [PATCH v4 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Wareing @ 2017-09-19  3:52 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- The rt_alloc_min sysfs option automatically selects the device (data
  device, or realtime) based on the size of the initial allocation of the
  file.
- This option can be used to route the storage of small files (and the
  inefficient workloads associated with them) to a suitable storage
  device such a SSD, while larger allocations are sent to a traditional
  HDD.
- Supports writes via O_DIRECT, buffered (i.e. page cache), and
  pre-allocations (i.e. fallocate)
- Available only when kernel is compiled w/ CONFIG_XFS_RT option.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v3:
* Now functions via initial allocation regardless of O_DIRECT, buffered or
  pre-allocation code paths.  Provides a consistent user-experience.
* I Did do some experiments putting this in the xfs_bmapi_write code path
  however pre-allocation accounting unfortunately prevents this cleaner
  approach.  As such, this proved to be the cleanest and functional approach.
* No longer a mount option, now a sysfs tunable

Changes since v2:
* None

 fs/xfs/xfs_bmap_util.c |  3 +++
 fs/xfs/xfs_inode.c     | 18 ++++++++++++------
 fs/xfs/xfs_iomap.c     |  8 ++++++++
 fs/xfs/xfs_mount.h     |  1 +
 fs/xfs/xfs_rtalloc.c   | 26 ++++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h   |  4 ++++
 fs/xfs/xfs_sysfs.c     | 41 +++++++++++++++++++++++++++++++++++++++++
 7 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 9e3cc21..2d253fb 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1026,6 +1026,9 @@ xfs_alloc_file_space(
 	if (len <= 0)
 		return -EINVAL;
 
+	if (XFS_IS_REALTIME_MOUNT(mp))
+		xfs_rt_alloc_min(ip, len);
+
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ec9826c..f9e2deb 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1620,12 +1620,18 @@ xfs_itruncate_extents(
 	if (error)
 		goto out;
 
-	/*
-	 * Clear the reflink flag if we truncated everything.
-	 */
-	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
-		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
-		xfs_inode_clear_cowblocks_tag(ip);
+	if (ip->i_d.di_nblocks == 0) {
+		/*
+		 * Clear the reflink flag if we truncated everything.
+		 */
+		if (xfs_is_reflink_inode(ip)) {
+			ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+			xfs_inode_clear_cowblocks_tag(ip);
+		}
+		/* Clear realtime flag if m_rt_alloc_min policy is in place */
+		if (XFS_IS_REALTIME_MOUNT(mp) && mp->m_rt_alloc_min) {
+			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+		}
 	}
 
 	/*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf..11f1c95 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -40,6 +40,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
+#include "xfs_rtalloc.h"
 
 
 #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
@@ -174,6 +175,10 @@ xfs_iomap_write_direct(
 	int		bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint		tflags = 0;
 
+
+	if (XFS_IS_REALTIME_MOUNT(mp))
+		xfs_rt_alloc_min(ip, count);
+
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
 	lockmode = XFS_ILOCK_SHARED;	/* locked by caller */
@@ -981,6 +986,9 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	if (XFS_IS_REALTIME_MOUNT(mp))
+		xfs_rt_alloc_min(ip, length);
+
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 9fa312a..067be3b 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,6 +197,7 @@ typedef struct xfs_mount {
 	__uint32_t		m_generation;
 
 	bool			m_fail_unmount;
+        uint                    m_rt_alloc_min; /* Min RT allocation */
 #ifdef DEBUG
 	/*
 	 * DEBUG mode instrumentation to test and/or trigger delayed allocation
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index c57aa7f..e51cb25 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,3 +1284,29 @@ xfs_rtpick_extent(
 	*pick = b;
 	return 0;
 }
+
+/*
+ * Automatically set real-time flag if initial write to inode is > m_rt_alloc_min
+ *
+ * Also valid on truncations.
+ *
+ */
+void xfs_rt_alloc_min(
+	struct xfs_inode	*ip,
+	xfs_off_t       	len)
+{
+	struct xfs_mount    *mp = ip->i_mount;
+
+	if (!mp->m_rt_alloc_min || ip->i_d.di_size)
+		return;
+
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		if (len < mp->m_rt_alloc_min) {
+			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+		}
+	} else {
+		if (len >= mp->m_rt_alloc_min) {
+			ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+		}
+	}
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index f13133e..12939d9 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -136,6 +136,9 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
 int xfs_rtalloc_query_all(struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
 			  void *priv);
+void xfs_rt_alloc_min(struct xfs_inode *ip, xfs_off_t len);
+
+
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
 # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
@@ -155,6 +158,7 @@ xfs_rtmount_init(
 }
 # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
 # define xfs_rtunmount_inodes(m)
+# define xfs_rt_alloc_min(i,l)                          (ENOSYS)
 #endif	/* CONFIG_XFS_RT */
 
 #endif	/* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 80ac15f..3c8dedb 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -129,10 +129,51 @@ XFS_SYSFS_ATTR_RW(drop_writes);
 
 #endif /* DEBUG */
 
+#ifdef CONFIG_XFS_RT
+STATIC ssize_t
+rt_alloc_min_store(
+	struct kobject			*kobject,
+	const char				*buf,
+	size_t					count)
+{
+	struct xfs_mount		*mp = to_mp(kobject);
+	int						ret;
+	int						val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	/* Only valid if using a real-time device */
+	if (XFS_IS_REALTIME_MOUNT(mp) && val > 0)
+		mp->m_rt_alloc_min = val;
+	else if (val <= 0)
+		mp->m_rt_alloc_min = 0;
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+STATIC ssize_t
+rt_alloc_min_show(
+	struct kobject          *kobject,
+	char                    *buf)
+{
+	struct xfs_mount        *mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_alloc_min);
+}
+XFS_SYSFS_ATTR_RW(rt_alloc_min);
+#endif /* CONFIG_XFS_RT */
+
 static struct attribute *xfs_mp_attrs[] = {
 #ifdef DEBUG
 	ATTR_LIST(drop_writes),
 #endif
+#ifdef CONFIG_XFS_RT
+	ATTR_LIST(rt_alloc_min),
+#endif
 	NULL,
 };
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/3] xfs: Add realtime fallback if data device full
  2017-09-19  3:52 [PATCH v4 0/3] XFS realtime device tweaks Richard Wareing
  2017-09-19  3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
  2017-09-19  3:52 ` [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-09-19  3:52 ` Richard Wareing
  2017-09-22  7:04   ` Dave Chinner
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Wareing @ 2017-09-19  3:52 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- Adds tunable option to fallback to realtime device if configured
  when data device is full.
- Useful for realtime device users to help prevent ENOSPC errors when
  selectively storing some files (e.g. small files) on data device, while
  others are stored on realtime block device.
- Set via the "rt_fallback_pct" sysfs value which is available if
  the kernel is compiled with CONFIG_XFS_RT.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v3:
* None, new patch to patch set

 fs/xfs/xfs_bmap_util.c |  4 +++-
 fs/xfs/xfs_fsops.c     |  4 ++++
 fs/xfs/xfs_iomap.c     |  8 ++++++--
 fs/xfs/xfs_mount.c     | 27 ++++++++++++++++++++++++++-
 fs/xfs/xfs_mount.h     |  7 ++++++-
 fs/xfs/xfs_rtalloc.c   | 14 ++++++++++++++
 fs/xfs/xfs_rtalloc.h   |  3 ++-
 fs/xfs/xfs_sysfs.c     | 39 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2d253fb..9797c69 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1026,8 +1026,10 @@ xfs_alloc_file_space(
 	if (len <= 0)
 		return -EINVAL;
 
-	if (XFS_IS_REALTIME_MOUNT(mp))
+	if (XFS_IS_REALTIME_MOUNT(mp)) {
 		xfs_rt_alloc_min(ip, len);
+		xfs_rt_fallback(ip, mp);
+	}
 
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6ccaae9..c15e906 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -610,6 +610,10 @@ xfs_growfs_data_private(
 	xfs_set_low_space_thresholds(mp);
 	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
+	if (XFS_IS_REALTIME_MOUNT(mp)) {
+		xfs_set_rt_min_fdblocks(mp);
+	}
+
 	/*
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 11f1c95..707ba97 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -176,8 +176,10 @@ xfs_iomap_write_direct(
 	uint		tflags = 0;
 
 
-	if (XFS_IS_REALTIME_MOUNT(mp))
+	if (XFS_IS_REALTIME_MOUNT(mp)) {
 		xfs_rt_alloc_min(ip, count);
+		xfs_rt_fallback(ip, mp);
+	}
 
 	rt = XFS_IS_REALTIME_INODE(ip);
 	extsz = xfs_get_extsz_hint(ip);
@@ -986,8 +988,10 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (XFS_IS_REALTIME_MOUNT(mp))
+	if (XFS_IS_REALTIME_MOUNT(mp)) {
 		xfs_rt_alloc_min(ip, length);
+		xfs_rt_fallback(ip, mp);
+	}
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2eaf818..543e80d 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -509,7 +509,6 @@ xfs_set_low_space_thresholds(
 	}
 }
 
-
 /*
  * Set whether we're using inode alignment.
  */
@@ -1396,3 +1395,29 @@ xfs_dev_is_read_only(
 	}
 	return 0;
 }
+
+/*
+ * precalculate minimum of data blocks required, if we fall
+ * below this value, we will fallback to the real-time device.
+ *
+ * m_rt_fallback_pct can only be non-zero if a real-time device
+ * is configured.
+ */
+void
+xfs_set_rt_min_fdblocks(
+	struct xfs_mount	*mp)
+{
+	if (mp->m_rt_fallback_pct) {
+		xfs_sb_t		*sbp = &mp->m_sb;
+		xfs_extlen_t 	lsize;
+		__uint64_t 		min_blocks;
+
+		lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0;
+		min_blocks = (mp->m_sb.sb_dblocks - lsize) * mp->m_rt_fallback_pct;
+		do_div(min_blocks, 100);
+		/* Pre-compute minimum data blocks required before
+		 * falling back to RT device for allocations
+		 */
+		mp->m_rt_min_fdblocks = min_blocks;
+	}
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 067be3b..36676c4 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,7 +197,11 @@ typedef struct xfs_mount {
 	__uint32_t		m_generation;
 
 	bool			m_fail_unmount;
-        uint                    m_rt_alloc_min; /* Min RT allocation */
+	uint			m_rt_alloc_min; /* Min RT allocation */
+	__uint8_t		m_rt_fallback_pct; /* Fall back to realtime device if
+										* data dev above rt_fallback_pct
+										*/
+	__uint64_t		m_rt_min_fdblocks; /* Realtime min fdblock threshold */
 #ifdef DEBUG
 	/*
 	 * DEBUG mode instrumentation to test and/or trigger delayed allocation
@@ -463,4 +467,5 @@ int	xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 		int error_class, int error);
 
+void            xfs_set_rt_min_fdblocks(struct xfs_mount *mp);
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index e51cb25..f0d25a0 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1310,3 +1310,17 @@ void xfs_rt_alloc_min(
 		}
 	}
 }
+
+void xfs_rt_fallback(
+    struct xfs_inode    *ip,
+    struct xfs_mount    *mp)
+{
+    if (!XFS_IS_REALTIME_INODE(ip)) {
+        __uint64_t      free;
+        free = percpu_counter_sum(&mp->m_fdblocks) -
+            mp->m_alloc_set_aside;
+        if (free < mp->m_rt_min_fdblocks) {
+            ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+        }
+    }
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 12939d9..28f3e42 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -137,7 +137,7 @@ int xfs_rtalloc_query_all(struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
 			  void *priv);
 void xfs_rt_alloc_min(struct xfs_inode *ip, xfs_off_t len);
-
+void xfs_rt_fallback(struct xfs_inode *ip, struct xfs_mount *mp);
 
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
@@ -159,6 +159,7 @@ xfs_rtmount_init(
 # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
 # define xfs_rtunmount_inodes(m)
 # define xfs_rt_alloc_min(i,l)                          (ENOSYS)
+# define xfs_rt_fallback(i,m)                           (ENOSYS)
 #endif	/* CONFIG_XFS_RT */
 
 #endif	/* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 3c8dedb..c22da05 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -165,6 +165,44 @@ rt_alloc_min_show(
 	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_alloc_min);
 }
 XFS_SYSFS_ATTR_RW(rt_alloc_min);
+
+STATIC ssize_t
+rt_fallback_pct_store(
+	struct kobject			*kobject,
+	const char				*buf,
+	size_t					count)
+{
+	struct xfs_mount		*mp = to_mp(kobject);
+	int						ret;
+	int						val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	/* Only valid if using a real-time device */
+	if (XFS_IS_REALTIME_MOUNT(mp) && ((val > 0) && (val <=100))) {
+		mp->m_rt_fallback_pct = val;
+		xfs_set_rt_min_fdblocks(mp);
+	} else if (val <= 0) {
+		mp->m_rt_fallback_pct = 0;
+		mp->m_rt_min_fdblocks = 0;
+	} else
+		return -EINVAL;
+
+	return count;
+}
+
+STATIC ssize_t
+rt_fallback_pct_show(
+	struct kobject          *kobject,
+	char                    *buf)
+{
+	struct xfs_mount        *mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_fallback_pct);
+}
+XFS_SYSFS_ATTR_RW(rt_fallback_pct);
 #endif /* CONFIG_XFS_RT */
 
 static struct attribute *xfs_mp_attrs[] = {
@@ -173,6 +211,7 @@ static struct attribute *xfs_mp_attrs[] = {
 #endif
 #ifdef CONFIG_XFS_RT
 	ATTR_LIST(rt_alloc_min),
+	ATTR_LIST(rt_fallback_pct),
 #endif
 	NULL,
 };
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-09-19  3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-09-22  5:35   ` Dave Chinner
  2017-09-22 18:26     ` Richard Wareing
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2017-09-22  5:35 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, darrick.wong, hch

On Mon, Sep 18, 2017 at 08:52:36PM -0700, Richard Wareing wrote:
> - Reports realtime device free blocks in statfs calls if inheritance
>   bit is set on the inode of directory.  This is a bit more intuitive,
>   especially for use-cases which are using a much larger device for
>   the realtime device.
> - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
>   realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
>   option.
> 
> Signed-off-by: Richard Wareing <rwareing@fb.com>

Didn't Christoph give a Reviewed-by for this one? Normally when we
repost a patch that already has a revewied-by we include it in the
commit message. That tells other reviewers and the maintainer that
it has already been reviewed :P

> Changes since v3:
> * Fixed accounting bug, we are not required to substract m_alloc_set_aside
>   as this is a data device only requirement.
> * Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
>   now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
>   for the inode.
> 
> Changes since v2:
> * Style updated per Christoph Hellwig's comment
> * Fixed bug: statp->f_bavail = statp->f_bfree
> 
>  fs/xfs/xfs_linux.h | 2 ++
>  fs/xfs/xfs_super.c | 8 ++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 044fb0e..fe46e71 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -280,8 +280,10 @@ static inline __uint64_t howmany_64(__uint64_t x, __uint32_t y)
>  
>  #ifdef CONFIG_XFS_RT
>  #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME)
> +#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
>  #else
>  #define XFS_IS_REALTIME_INODE(ip) (0)
> +#define XFS_IS_REALTIME_MOUNT(mp) (0)
>  #endif

Good idea. I think we should probably turn these into inline
functions. That can be done in a new patch, though.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size
  2017-09-19  3:52 ` [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-09-22  5:54   ` Dave Chinner
  2017-09-22 19:06     ` Richard Wareing
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2017-09-22  5:54 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, darrick.wong, hch

On Mon, Sep 18, 2017 at 08:52:37PM -0700, Richard Wareing wrote:
> - The rt_alloc_min sysfs option automatically selects the device (data
>   device, or realtime) based on the size of the initial allocation of the
>   file.
> - This option can be used to route the storage of small files (and the
>   inefficient workloads associated with them) to a suitable storage
>   device such a SSD, while larger allocations are sent to a traditional
>   HDD.
> - Supports writes via O_DIRECT, buffered (i.e. page cache), and
>   pre-allocations (i.e. fallocate)
> - Available only when kernel is compiled w/ CONFIG_XFS_RT option.
> 
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v3:
> * Now functions via initial allocation regardless of O_DIRECT, buffered or
>   pre-allocation code paths.  Provides a consistent user-experience.
> * I Did do some experiments putting this in the xfs_bmapi_write code path
>   however pre-allocation accounting unfortunately prevents this cleaner
>   approach.  As such, this proved to be the cleanest and functional approach.
> * No longer a mount option, now a sysfs tunable
> 
> Changes since v2:
> * None
> 
>  fs/xfs/xfs_bmap_util.c |  3 +++
>  fs/xfs/xfs_inode.c     | 18 ++++++++++++------
>  fs/xfs/xfs_iomap.c     |  8 ++++++++
>  fs/xfs/xfs_mount.h     |  1 +
>  fs/xfs/xfs_rtalloc.c   | 26 ++++++++++++++++++++++++++
>  fs/xfs/xfs_rtalloc.h   |  4 ++++
>  fs/xfs/xfs_sysfs.c     | 41 +++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 9e3cc21..2d253fb 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1026,6 +1026,9 @@ xfs_alloc_file_space(
>  	if (len <= 0)
>  		return -EINVAL;
>  
> +	if (XFS_IS_REALTIME_MOUNT(mp))
> +		xfs_rt_alloc_min(ip, len);

I'd put the XFS_IS_REALTIME_MOUNT() check inside xfs_rt_alloc_min().
That way we can compile the code out completely when
CONFIG_XFS_RT=n.

>  	rt = XFS_IS_REALTIME_INODE(ip);
>  	extsz = xfs_get_extsz_hint(ip);
>  
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ec9826c..f9e2deb 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1620,12 +1620,18 @@ xfs_itruncate_extents(
>  	if (error)
>  		goto out;
>  
> -	/*
> -	 * Clear the reflink flag if we truncated everything.
> -	 */
> -	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
> -		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
> -		xfs_inode_clear_cowblocks_tag(ip);
> +	if (ip->i_d.di_nblocks == 0) {
> +		/*
> +		 * Clear the reflink flag if we truncated everything.
> +		 */
> +		if (xfs_is_reflink_inode(ip)) {
> +			ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
> +			xfs_inode_clear_cowblocks_tag(ip);
> +		}
> +		/* Clear realtime flag if m_rt_alloc_min policy is in place */
> +		if (XFS_IS_REALTIME_MOUNT(mp) && mp->m_rt_alloc_min) {

Won't m_rt_alloc_min always be zero on non-rt filesystems?

> +			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
> +		}
>  	}
>  
>  	/*
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 94e5bdf..11f1c95 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -40,6 +40,7 @@
>  #include "xfs_dquot_item.h"
>  #include "xfs_dquot.h"
>  #include "xfs_reflink.h"
> +#include "xfs_rtalloc.h"
>  
>  
>  #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
> @@ -174,6 +175,10 @@ xfs_iomap_write_direct(
>  	int		bmapi_flags = XFS_BMAPI_PREALLOC;
>  	uint		tflags = 0;
>  
> +
> +	if (XFS_IS_REALTIME_MOUNT(mp))
> +		xfs_rt_alloc_min(ip, count);

Reading this makes me wonder what we are allocating here :/
A better name might be in order - something that indicates we're
selecting the target device rather allocating something. e.g.
xfs_inode_select_target()?

> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 9fa312a..067be3b 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -197,6 +197,7 @@ typedef struct xfs_mount {
>  	__uint32_t		m_generation;
>  
>  	bool			m_fail_unmount;
> +        uint                    m_rt_alloc_min; /* Min RT allocation */

WHitespace problem - looks like spaces instead of tabs....

>  #ifdef DEBUG
>  	/*
>  	 * DEBUG mode instrumentation to test and/or trigger delayed allocation
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index c57aa7f..e51cb25 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -1284,3 +1284,29 @@ xfs_rtpick_extent(
>  	*pick = b;
>  	return 0;
>  }
> +
> +/*
> + * Automatically set real-time flag if initial write to inode is > m_rt_alloc_min
> + *
> + * Also valid on truncations.
> + *
> + */
> +void xfs_rt_alloc_min(
> +	struct xfs_inode	*ip,
> +	xfs_off_t       	len)
> +{
> +	struct xfs_mount    *mp = ip->i_mount;
> +
> +	if (!mp->m_rt_alloc_min || ip->i_d.di_size)
> +		return;

I kinda prefer stacking single checks like

	/*
	 * m_rt_alloc_min controls the target selection. It is
	 * inactive if it is not set.
	 */
	if (!mp->m_rt_alloc_min)
		return;

	/*
	 * Can't select a different target if we've already
	 * allocated blocks (e.g. fallocate() beyond EOF) or has
	 * data in it already.
	 */
	if (!ip->i_nextents)
		return;
	if (!ip->i_d.di_size)
		return;

> +	if (XFS_IS_REALTIME_INODE(ip)) {
> +		if (len < mp->m_rt_alloc_min) {
> +			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
> +		}
> +	} else {
> +		if (len >= mp->m_rt_alloc_min) {
> +			ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
> +		}
> +	}

Checking for XFS_IS_REALTIME_INODE() is redundant here. This does
the same thing:

	/*
	 * if the allocation length is less than the threshold,
	 * always select the data device. Otherwise we should
	 * select the realtime device.
	 */
	if (len < mp->m_rt_alloc_min)
		ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
	else
		ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;

> +}
> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
> index f13133e..12939d9 100644
> --- a/fs/xfs/xfs_rtalloc.h
> +++ b/fs/xfs/xfs_rtalloc.h
> @@ -136,6 +136,9 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
>  int xfs_rtalloc_query_all(struct xfs_trans *tp,
>  			  xfs_rtalloc_query_range_fn fn,
>  			  void *priv);
> +void xfs_rt_alloc_min(struct xfs_inode *ip, xfs_off_t len);
> +
> +
>  #else
>  # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
>  # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
> @@ -155,6 +158,7 @@ xfs_rtmount_init(
>  }
>  # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
>  # define xfs_rtunmount_inodes(m)
> +# define xfs_rt_alloc_min(i,l)                          (ENOSYS)
>  #endif	/* CONFIG_XFS_RT */
>  
>  #endif	/* __XFS_RTALLOC_H__ */
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index 80ac15f..3c8dedb 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -129,10 +129,51 @@ XFS_SYSFS_ATTR_RW(drop_writes);
>  
>  #endif /* DEBUG */
>  
> +#ifdef CONFIG_XFS_RT
> +STATIC ssize_t
> +rt_alloc_min_store(
> +	struct kobject			*kobject,
> +	const char				*buf,
> +	size_t					count)
> +{
> +	struct xfs_mount		*mp = to_mp(kobject);
> +	int						ret;
> +	int						val;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	/* Only valid if using a real-time device */
> +	if (XFS_IS_REALTIME_MOUNT(mp) && val > 0)
> +		mp->m_rt_alloc_min = val;
> +	else if (val <= 0)
> +		mp->m_rt_alloc_min = 0;
> +	else
> +		return -EINVAL;

Seems inconsistent. This will allow a value <= 0 for a non-realtime
mount, but will return EINVAL for values > 0. Perhaps it would be
more consistent to return EINVAL for a non-realtime mount or
any value < 0?

> +
> +	return count;
> +}
> +
> +STATIC ssize_t
> +rt_alloc_min_show(
> +	struct kobject          *kobject,
> +	char                    *buf)
> +{
> +	struct xfs_mount        *mp = to_mp(kobject);
> +
> +	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_alloc_min);
> +}
> +XFS_SYSFS_ATTR_RW(rt_alloc_min);
> +#endif /* CONFIG_XFS_RT */
> +
>  static struct attribute *xfs_mp_attrs[] = {
>  #ifdef DEBUG
>  	ATTR_LIST(drop_writes),
>  #endif
> +#ifdef CONFIG_XFS_RT
> +	ATTR_LIST(rt_alloc_min),
> +#endif

This is userspace visible - shouldn't we always compile this in,
even if all it does it return EINVAL to attempts to change it when
CONFIG_XFS_RT=n?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] xfs: Add realtime fallback if data device full
  2017-09-19  3:52 ` [PATCH v4 3/3] xfs: Add realtime fallback if data device full Richard Wareing
@ 2017-09-22  7:04   ` Dave Chinner
  2017-09-25 18:37     ` Richard Wareing
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2017-09-22  7:04 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, darrick.wong, hch

On Mon, Sep 18, 2017 at 08:52:38PM -0700, Richard Wareing wrote:
> - Adds tunable option to fallback to realtime device if configured
>   when data device is full.
> - Useful for realtime device users to help prevent ENOSPC errors when
>   selectively storing some files (e.g. small files) on data device, while
>   others are stored on realtime block device.
> - Set via the "rt_fallback_pct" sysfs value which is available if
>   the kernel is compiled with CONFIG_XFS_RT.
> 
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v3:
> * None, new patch to patch set
> 
>  fs/xfs/xfs_bmap_util.c |  4 +++-
>  fs/xfs/xfs_fsops.c     |  4 ++++
>  fs/xfs/xfs_iomap.c     |  8 ++++++--
>  fs/xfs/xfs_mount.c     | 27 ++++++++++++++++++++++++++-
>  fs/xfs/xfs_mount.h     |  7 ++++++-
>  fs/xfs/xfs_rtalloc.c   | 14 ++++++++++++++
>  fs/xfs/xfs_rtalloc.h   |  3 ++-
>  fs/xfs/xfs_sysfs.c     | 39 +++++++++++++++++++++++++++++++++++++++
>  8 files changed, 100 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2d253fb..9797c69 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1026,8 +1026,10 @@ xfs_alloc_file_space(
>  	if (len <= 0)
>  		return -EINVAL;
>  
> -	if (XFS_IS_REALTIME_MOUNT(mp))
> +	if (XFS_IS_REALTIME_MOUNT(mp)) {
>  		xfs_rt_alloc_min(ip, len);
> +		xfs_rt_fallback(ip, mp);
> +	}

This should really go inside xfs_inode_select_target() as space
availability is just another selection criteria....

>  	rt = XFS_IS_REALTIME_INODE(ip);
>  	extsz = xfs_get_extsz_hint(ip);
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 6ccaae9..c15e906 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -610,6 +610,10 @@ xfs_growfs_data_private(
>  	xfs_set_low_space_thresholds(mp);
>  	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>  
> +	if (XFS_IS_REALTIME_MOUNT(mp)) {
> +		xfs_set_rt_min_fdblocks(mp);
> +	}

Normal way to do this is something like:

	mp->m_rt_min_fdblocks = xfs_calc_min_free_rtblocks(mp);

and check XFS_IS_REALTIME_MOUNT() inside that function.

And now, reading on, I find I've completely misunderstood what
those variable and function names mean, so they need renaming
to be clearer.

> +
> +/*
> + * precalculate minimum of data blocks required, if we fall
> + * below this value, we will fallback to the real-time device.
> + *
> + * m_rt_fallback_pct can only be non-zero if a real-time device
> + * is configured.
> + */
> +void
> +xfs_set_rt_min_fdblocks(
> +	struct xfs_mount	*mp)
> +{
> +	if (mp->m_rt_fallback_pct) {
> +		xfs_sb_t		*sbp = &mp->m_sb;
> +		xfs_extlen_t 	lsize;
> +		__uint64_t 		min_blocks;

Stray whitespace. If you use vim, add this macro to your
.vimrc so that it highlights stray whitespace for you:

" highlight whitespace damage
highlight RedundantSpaces ctermbg=red guibg=red
match RedundantSpaces /\s\+$\| \+\ze\t/

> +
> +		lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0;
> +		min_blocks = (mp->m_sb.sb_dblocks - lsize) * mp->m_rt_fallback_pct;
> +		do_div(min_blocks, 100);

Why bother with the log size?

> +		/* Pre-compute minimum data blocks required before
> +		 * falling back to RT device for allocations
> +		 */

Comment format.

> +		mp->m_rt_min_fdblocks = min_blocks;

Hmmm - I wonder if it would be better to tie this into the existing
data device low space threshold code?

> +	}

Also, indenting.

	if (!XFS_IS_REALTIME_MOUNT(mp))
		return 0;
	if (!mp->m_rt_fallback_pct)
		return 0;
	....



> +}
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 067be3b..36676c4 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -197,7 +197,11 @@ typedef struct xfs_mount {
>  	__uint32_t		m_generation;
>  
>  	bool			m_fail_unmount;
> -        uint                    m_rt_alloc_min; /* Min RT allocation */
> +	uint			m_rt_alloc_min; /* Min RT allocation */
> +	__uint8_t		m_rt_fallback_pct; /* Fall back to realtime device if

uint32_t is fine. We're moving away from the __[u]int types, so
we shouldn't really add any new ones.

> +void xfs_rt_fallback(
> +    struct xfs_inode    *ip,
> +    struct xfs_mount    *mp)

Mount first, then inode.

> +{
> +    if (!XFS_IS_REALTIME_INODE(ip)) {
> +        __uint64_t      free;
> +        free = percpu_counter_sum(&mp->m_fdblocks) -
> +            mp->m_alloc_set_aside;
> +        if (free < mp->m_rt_min_fdblocks) {
> +            ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
> +        }
> +    }
> +}

Indenting, but should really move to the target selection function.

> +STATIC ssize_t
> +rt_fallback_pct_store(
> +	struct kobject			*kobject,
> +	const char				*buf,
> +	size_t					count)
> +{
> +	struct xfs_mount		*mp = to_mp(kobject);
> +	int						ret;
> +	int						val;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	/* Only valid if using a real-time device */
> +	if (XFS_IS_REALTIME_MOUNT(mp) && ((val > 0) && (val <=100))) {
> +		mp->m_rt_fallback_pct = val;
> +		xfs_set_rt_min_fdblocks(mp);
> +	} else if (val <= 0) {
> +		mp->m_rt_fallback_pct = 0;
> +		mp->m_rt_min_fdblocks = 0;
> +	} else
> +		return -EINVAL;

Same issue as last patch.

Also, just set then threshold percentage, and if there is no error,
then call xfs_set_rt_min_fdblocks() rather than zeroing it directly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-09-22  5:35   ` Dave Chinner
@ 2017-09-22 18:26     ` Richard Wareing
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Wareing @ 2017-09-22 18:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, darrick.wong, hch

Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Sep 18, 2017 at 08:52:36PM -0700, Richard Wareing wrote:
>> - Reports realtime device free blocks in statfs calls if inheritance
>>   bit is set on the inode of directory.  This is a bit more intuitive,
>>   especially for use-cases which are using a much larger device for
>>   the realtime device.
>> - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
>>   realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
>>   option.
>>
>> Signed-off-by: Richard Wareing <rwareing@fb.com>
>
> Didn't Christoph give a Reviewed-by for this one? Normally when we
> repost a patch that already has a revewied-by we include it in the
> commit message. That tells other reviewers and the maintainer that
> it has already been reviewed :P
>

Oh, I completely missed it (back on Sept. 3rd).  I'd rather go with this
version with the "XFS_IS_REALTIME_MOUNT" addition given the learnings from
the CVE.

I'll be sure to mention it in my commit message in the future.

Thanks,

Richard


>> Changes since v3:
>> * Fixed accounting bug, we are not required to substract m_alloc_set_aside
>>   as this is a data device only requirement.
>> * Added XFS_IS_REALTIME_MOUNT macro based on learnings from  
>> CVE-2017-14340,
>>   now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
>>   for the inode.
>>
>> Changes since v2:
>> * Style updated per Christoph Hellwig's comment
>> * Fixed bug: statp->f_bavail = statp->f_bfree
>>
>>  fs/xfs/xfs_linux.h | 2 ++
>>  fs/xfs/xfs_super.c | 8 ++++++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
>> index 044fb0e..fe46e71 100644
>> --- a/fs/xfs/xfs_linux.h
>> +++ b/fs/xfs/xfs_linux.h
>> @@ -280,8 +280,10 @@ static inline __uint64_t howmany_64(__uint64_t x,  
>> __uint32_t y)
>>
>>  #ifdef CONFIG_XFS_RT
>>  #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME)
>> +#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
>>  #else
>>  #define XFS_IS_REALTIME_INODE(ip) (0)
>> +#define XFS_IS_REALTIME_MOUNT(mp) (0)
>>  #endif
>
> Good idea. I think we should probably turn these into inline
> functions. That can be done in a new patch, though.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size
  2017-09-22  5:54   ` Dave Chinner
@ 2017-09-22 19:06     ` Richard Wareing
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Wareing @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, darrick.wong, hch

Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Sep 18, 2017 at 08:52:37PM -0700, Richard Wareing wrote:
>> - The rt_alloc_min sysfs option automatically selects the device (data
>>   device, or realtime) based on the size of the initial allocation of the
>>   file.
>> - This option can be used to route the storage of small files (and the
>>   inefficient workloads associated with them) to a suitable storage
>>   device such a SSD, while larger allocations are sent to a traditional
>>   HDD.
>> - Supports writes via O_DIRECT, buffered (i.e. page cache), and
>>   pre-allocations (i.e. fallocate)
>> - Available only when kernel is compiled w/ CONFIG_XFS_RT option.
>>
>> Signed-off-by: Richard Wareing <rwareing@fb.com>
>> ---
>> Changes since v3:
>> * Now functions via initial allocation regardless of O_DIRECT, buffered or
>>   pre-allocation code paths.  Provides a consistent user-experience.
>> * I Did do some experiments putting this in the xfs_bmapi_write code path
>>   however pre-allocation accounting unfortunately prevents this cleaner
>>   approach.  As such, this proved to be the cleanest and functional approach.
>> * No longer a mount option, now a sysfs tunable
>>
>> Changes since v2:
>> * None
>>
>>  fs/xfs/xfs_bmap_util.c |  3 +++
>>  fs/xfs/xfs_inode.c     | 18 ++++++++++++------
>>  fs/xfs/xfs_iomap.c     |  8 ++++++++
>>  fs/xfs/xfs_mount.h     |  1 +
>>  fs/xfs/xfs_rtalloc.c   | 26 ++++++++++++++++++++++++++
>>  fs/xfs/xfs_rtalloc.h   |  4 ++++
>>  fs/xfs/xfs_sysfs.c     | 41 +++++++++++++++++++++++++++++++++++++++++
>>  7 files changed, 95 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index 9e3cc21..2d253fb 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1026,6 +1026,9 @@ xfs_alloc_file_space(
>>  	if (len <= 0)
>>  		return -EINVAL;
>>
>> +	if (XFS_IS_REALTIME_MOUNT(mp))
>> +		xfs_rt_alloc_min(ip, len);
>
> I'd put the XFS_IS_REALTIME_MOUNT() check inside xfs_rt_alloc_min().
> That way we can compile the code out completely when
> CONFIG_XFS_RT=n.
>

I was wondering about this, but elected to model the
XFS_IS_REALTIME_INODE code structure.  I'm partial to putting
this in the function as well, as it reads cleaner too.

>> rt = XFS_IS_REALTIME_INODE(ip);
>>  	extsz = xfs_get_extsz_hint(ip);
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index ec9826c..f9e2deb 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -1620,12 +1620,18 @@ xfs_itruncate_extents(
>>  	if (error)
>>  		goto out;
>>
>> -	/*
>> -	 * Clear the reflink flag if we truncated everything.
>> -	 */
>> -	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
>> -		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
>> -		xfs_inode_clear_cowblocks_tag(ip);
>> +	if (ip->i_d.di_nblocks == 0) {
>> +		/*
>> +		 * Clear the reflink flag if we truncated everything.
>> +		 */
>> +		if (xfs_is_reflink_inode(ip)) {
>> +			ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
>> +			xfs_inode_clear_cowblocks_tag(ip);
>> +		}
>> +		/* Clear realtime flag if m_rt_alloc_min policy is in place */
>> +		if (XFS_IS_REALTIME_MOUNT(mp) && mp->m_rt_alloc_min) {
>
> Won't m_rt_alloc_min always be zero on non-rt filesystems?
>

Yes, but was trying to be defensive, in case some future me or other
developer screwed something up in the future :).

>> +			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
>> +		}
>>  	}
>>
>>  	/*
>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>> index 94e5bdf..11f1c95 100644
>> --- a/fs/xfs/xfs_iomap.c
>> +++ b/fs/xfs/xfs_iomap.c
>> @@ -40,6 +40,7 @@
>>  #include "xfs_dquot_item.h"
>>  #include "xfs_dquot.h"
>>  #include "xfs_reflink.h"
>> +#include "xfs_rtalloc.h"
>>
>>
>>  #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
>> @@ -174,6 +175,10 @@ xfs_iomap_write_direct(
>>  	int		bmapi_flags = XFS_BMAPI_PREALLOC;
>>  	uint		tflags = 0;
>>
>> +
>> +	if (XFS_IS_REALTIME_MOUNT(mp))
>> +		xfs_rt_alloc_min(ip, count);
>
> Reading this makes me wonder what we are allocating here :/
> A better name might be in order - something that indicates we're
> selecting the target device rather allocating something. e.g.
> xfs_inode_select_target()?

I like your function name better than mine, changing this.

>
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index 9fa312a..067be3b 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -197,6 +197,7 @@ typedef struct xfs_mount {
>>  	__uint32_t		m_generation;
>>
>>  	bool			m_fail_unmount;
>> +        uint                    m_rt_alloc_min; /* Min RT allocation */
>
> WHitespace problem - looks like spaces instead of tabs....
>

Fixing.

>> #ifdef DEBUG
>>  	/*
>>  	 * DEBUG mode instrumentation to test and/or trigger delayed allocation
>> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
>> index c57aa7f..e51cb25 100644
>> --- a/fs/xfs/xfs_rtalloc.c
>> +++ b/fs/xfs/xfs_rtalloc.c
>> @@ -1284,3 +1284,29 @@ xfs_rtpick_extent(
>>  	*pick = b;
>>  	return 0;
>>  }
>> +
>> +/*
>> + * Automatically set real-time flag if initial write to inode is >  
>> m_rt_alloc_min
>> + *
>> + * Also valid on truncations.
>> + *
>> + */
>> +void xfs_rt_alloc_min(
>> +	struct xfs_inode	*ip,
>> +	xfs_off_t       	len)
>> +{
>> +	struct xfs_mount    *mp = ip->i_mount;
>> +
>> +	if (!mp->m_rt_alloc_min || ip->i_d.di_size)
>> +		return;
>
> I kinda prefer stacking single checks like
>
> 	/*
> 	 * m_rt_alloc_min controls the target selection. It is
> 	 * inactive if it is not set.
> 	 */
> 	if (!mp->m_rt_alloc_min)
> 		return;
>
> 	/*
> 	 * Can't select a different target if we've already
> 	 * allocated blocks (e.g. fallocate() beyond EOF) or has
> 	 * data in it already.
> 	 */
> 	if (!ip->i_nextents)
> 		return;
> 	if (!ip->i_d.di_size)
> 		return;
>

Fixing.

>> +	if (XFS_IS_REALTIME_INODE(ip)) {
>> +		if (len < mp->m_rt_alloc_min) {
>> +			ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
>> +		}
>> +	} else {
>> +		if (len >= mp->m_rt_alloc_min) {
>> +			ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
>> +		}
>> +	}
>
> Checking for XFS_IS_REALTIME_INODE() is redundant here. This does
> the same thing:
>
> 	/*
> 	 * if the allocation length is less than the threshold,
> 	 * always select the data device. Otherwise we should
> 	 * select the realtime device.
> 	 */
> 	if (len < mp->m_rt_alloc_min)
> 		ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
> 	else
> 		ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
>

Fixing.


>> +}
>> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
>> index f13133e..12939d9 100644
>> --- a/fs/xfs/xfs_rtalloc.h
>> +++ b/fs/xfs/xfs_rtalloc.h
>> @@ -136,6 +136,9 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
>>  int xfs_rtalloc_query_all(struct xfs_trans *tp,
>>  			  xfs_rtalloc_query_range_fn fn,
>>  			  void *priv);
>> +void xfs_rt_alloc_min(struct xfs_inode *ip, xfs_off_t len);
>> +
>> +
>>  #else
>>  # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
>>  # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
>> @@ -155,6 +158,7 @@ xfs_rtmount_init(
>>  }
>>  # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
>>  # define xfs_rtunmount_inodes(m)
>> +# define xfs_rt_alloc_min(i,l)                          (ENOSYS)
>>  #endif	/* CONFIG_XFS_RT */
>>
>>  #endif	/* __XFS_RTALLOC_H__ */
>> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
>> index 80ac15f..3c8dedb 100644
>> --- a/fs/xfs/xfs_sysfs.c
>> +++ b/fs/xfs/xfs_sysfs.c
>> @@ -129,10 +129,51 @@ XFS_SYSFS_ATTR_RW(drop_writes);
>>
>>  #endif /* DEBUG */
>>
>> +#ifdef CONFIG_XFS_RT
>> +STATIC ssize_t
>> +rt_alloc_min_store(
>> +	struct kobject			*kobject,
>> +	const char				*buf,
>> +	size_t					count)
>> +{
>> +	struct xfs_mount		*mp = to_mp(kobject);
>> +	int						ret;
>> +	int						val;
>> +
>> +	ret = kstrtoint(buf, 0, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Only valid if using a real-time device */
>> +	if (XFS_IS_REALTIME_MOUNT(mp) && val > 0)
>> +		mp->m_rt_alloc_min = val;
>> +	else if (val <= 0)
>> +		mp->m_rt_alloc_min = 0;
>> +	else
>> +		return -EINVAL;
>
> Seems inconsistent. This will allow a value <= 0 for a non-realtime
> mount, but will return EINVAL for values > 0. Perhaps it would be
> more consistent to return EINVAL for a non-realtime mount or
> any value < 0?

Fair observation, fixing.

>
>> +
>> +	return count;
>> +}
>> +
>> +STATIC ssize_t
>> +rt_alloc_min_show(
>> +	struct kobject          *kobject,
>> +	char                    *buf)
>> +{
>> +	struct xfs_mount        *mp = to_mp(kobject);
>> +
>> +	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_alloc_min);
>> +}
>> +XFS_SYSFS_ATTR_RW(rt_alloc_min);
>> +#endif /* CONFIG_XFS_RT */
>> +
>>  static struct attribute *xfs_mp_attrs[] = {
>>  #ifdef DEBUG
>>  	ATTR_LIST(drop_writes),
>>  #endif
>> +#ifdef CONFIG_XFS_RT
>> +	ATTR_LIST(rt_alloc_min),
>> +#endif
>
> This is userspace visible - shouldn't we always compile this in,
> even if all it does it return EINVAL to attempts to change it when
> CONFIG_XFS_RT=n?
>

I'm fine with changing this, my thinking here was to simply not
show options which aren't tunable without a realtime device.  But
I'm fine with throwing EINVAL too.


Richard




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] xfs: Add realtime fallback if data device full
  2017-09-22  7:04   ` Dave Chinner
@ 2017-09-25 18:37     ` Richard Wareing
  2017-09-25 23:16       ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Wareing @ 2017-09-25 18:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, darrick.wong, hch

Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Sep 18, 2017 at 08:52:38PM -0700, Richard Wareing wrote:
>> - Adds tunable option to fallback to realtime device if configured
>>   when data device is full.
>> - Useful for realtime device users to help prevent ENOSPC errors when
>>   selectively storing some files (e.g. small files) on data device, while
>>   others are stored on realtime block device.
>> - Set via the "rt_fallback_pct" sysfs value which is available if
>>   the kernel is compiled with CONFIG_XFS_RT.
>>
>> Signed-off-by: Richard Wareing <rwareing@fb.com>
>> ---
>> Changes since v3:
>> * None, new patch to patch set
>>
>>  fs/xfs/xfs_bmap_util.c |  4 +++-
>>  fs/xfs/xfs_fsops.c     |  4 ++++
>>  fs/xfs/xfs_iomap.c     |  8 ++++++--
>>  fs/xfs/xfs_mount.c     | 27 ++++++++++++++++++++++++++-
>>  fs/xfs/xfs_mount.h     |  7 ++++++-
>>  fs/xfs/xfs_rtalloc.c   | 14 ++++++++++++++
>>  fs/xfs/xfs_rtalloc.h   |  3 ++-
>>  fs/xfs/xfs_sysfs.c     | 39 +++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 100 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index 2d253fb..9797c69 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1026,8 +1026,10 @@ xfs_alloc_file_space(
>>  	if (len <= 0)
>>  		return -EINVAL;
>>
>> -	if (XFS_IS_REALTIME_MOUNT(mp))
>> +	if (XFS_IS_REALTIME_MOUNT(mp)) {
>>  		xfs_rt_alloc_min(ip, len);
>> +		xfs_rt_fallback(ip, mp);
>> +	}
>
> This should really go inside xfs_inode_select_target() as space
> availability is just another selection criteria....
>

Re-worked this.  I like this idea better as well.

>> rt = XFS_IS_REALTIME_INODE(ip);
>>  	extsz = xfs_get_extsz_hint(ip);
>> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
>> index 6ccaae9..c15e906 100644
>> --- a/fs/xfs/xfs_fsops.c
>> +++ b/fs/xfs/xfs_fsops.c
>> @@ -610,6 +610,10 @@ xfs_growfs_data_private(
>>  	xfs_set_low_space_thresholds(mp);
>>  	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>>
>> +	if (XFS_IS_REALTIME_MOUNT(mp)) {
>> +		xfs_set_rt_min_fdblocks(mp);
>> +	}
>
> Normal way to do this is something like:
>
> 	mp->m_rt_min_fdblocks = xfs_calc_min_free_rtblocks(mp);
>
> and check XFS_IS_REALTIME_MOUNT() inside that function.
>
> And now, reading on, I find I've completely misunderstood what
> those variable and function names mean, so they need renaming
> to be clearer.

Fixed in new version, in my defense I was modeling this after
xfs_set_low_space_thresholds, but looks like xfs_alloc_set_aside
as you point out is a better pattern to follow (since mine is a
trivial assignment).

>
>> +
>> +/*
>> + * precalculate minimum of data blocks required, if we fall
>> + * below this value, we will fallback to the real-time device.
>> + *
>> + * m_rt_fallback_pct can only be non-zero if a real-time device
>> + * is configured.
>> + */
>> +void
>> +xfs_set_rt_min_fdblocks(
>> +	struct xfs_mount	*mp)
>> +{
>> +	if (mp->m_rt_fallback_pct) {
>> +		xfs_sb_t		*sbp = &mp->m_sb;
>> +		xfs_extlen_t 	lsize;
>> +		__uint64_t 		min_blocks;
>
> Stray whitespace. If you use vim, add this macro to your
> .vimrc so that it highlights stray whitespace for you:
>
> " highlight whitespace damage
> highlight RedundantSpaces ctermbg=red guibg=red
> match RedundantSpaces /\s\+$\| \+\ze\t/
>


Thanks for this, added to my vimrc!

>> +
>> +		lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0;
>> +		min_blocks = (mp->m_sb.sb_dblocks - lsize) * mp->m_rt_fallback_pct;
>> +		do_div(min_blocks, 100);
>
> Why bother with the log size?
>

My thinking was to subtract out blocks used by the log, but upon further
reflection this isn't correct since the log blocks are already decremented
out of the free block count.  I've fixed this up.


>> +		/* Pre-compute minimum data blocks required before
>> +		 * falling back to RT device for allocations
>> +		 */
>
> Comment format.
>
>> +		mp->m_rt_min_fdblocks = min_blocks;
>
> Hmmm - I wonder if it would be better to tie this into the existing
> data device low space threshold code?
>

Not sure what you mean here?


>> +	}
>
> Also, indenting.
>
> 	if (!XFS_IS_REALTIME_MOUNT(mp))
> 		return 0;
> 	if (!mp->m_rt_fallback_pct)
> 		return 0;
> 	....
>
>
>
>> +}
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index 067be3b..36676c4 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -197,7 +197,11 @@ typedef struct xfs_mount {
>>  	__uint32_t		m_generation;
>>
>>  	bool			m_fail_unmount;
>> -        uint                    m_rt_alloc_min; /* Min RT allocation */
>> +	uint			m_rt_alloc_min; /* Min RT allocation */
>> +	__uint8_t		m_rt_fallback_pct; /* Fall back to realtime device if
>
> uint32_t is fine. We're moving away from the __[u]int types, so
> we shouldn't really add any new ones.
>

Fixed.

>> +void xfs_rt_fallback(
>> +    struct xfs_inode    *ip,
>> +    struct xfs_mount    *mp)
>
> Mount first, then inode.
>

Just for my own knowledge, is this a convention?

>> +{
>> +    if (!XFS_IS_REALTIME_INODE(ip)) {
>> +        __uint64_t      free;
>> +        free = percpu_counter_sum(&mp->m_fdblocks) -
>> +            mp->m_alloc_set_aside;
>> +        if (free < mp->m_rt_min_fdblocks) {
>> +            ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
>> +        }
>> +    }
>> +}
>
> Indenting, but should really move to the target selection function.
>

Done.  I added a target function and two functions to contain the policy
code specific to each threshold.

>> +STATIC ssize_t
>> +rt_fallback_pct_store(
>> +	struct kobject			*kobject,
>> +	const char				*buf,
>> +	size_t					count)
>> +{
>> +	struct xfs_mount		*mp = to_mp(kobject);
>> +	int						ret;
>> +	int						val;
>> +
>> +	ret = kstrtoint(buf, 0, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Only valid if using a real-time device */
>> +	if (XFS_IS_REALTIME_MOUNT(mp) && ((val > 0) && (val <=100))) {
>> +		mp->m_rt_fallback_pct = val;
>> +		xfs_set_rt_min_fdblocks(mp);
>> +	} else if (val <= 0) {
>> +		mp->m_rt_fallback_pct = 0;
>> +		mp->m_rt_min_fdblocks = 0;
>> +	} else
>> +		return -EINVAL;
>
> Same issue as last patch.
>
> Also, just set then threshold percentage, and if there is no error,
> then call xfs_set_rt_min_fdblocks() rather than zeroing it directly.
>

Fixed.

> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] xfs: Add realtime fallback if data device full
  2017-09-25 18:37     ` Richard Wareing
@ 2017-09-25 23:16       ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2017-09-25 23:16 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, darrick.wong, hch

On Mon, Sep 25, 2017 at 11:37:02AM -0700, Richard Wareing wrote:
> Dave Chinner <david@fromorbit.com> wrote:
> >On Mon, Sep 18, 2017 at 08:52:38PM -0700, Richard Wareing wrote:
> >>+		/* Pre-compute minimum data blocks required before
> >>+		 * falling back to RT device for allocations
> >>+		 */
> >
> >Comment format.
> >
> >>+		mp->m_rt_min_fdblocks = min_blocks;
> >
> >Hmmm - I wonder if it would be better to tie this into the existing
> >data device low space threshold code?
> >
> 
> Not sure what you mean here?

The lowspace threshold code is just an array of thresholds at which
we do trigger different behaviour. e.g. we trim maximum speculative
delalloc space based one where free space falls in that table.  If
we add another entry in the table for the "switch to RT device at
this low space" threshold, we don't need a specific variable in the
struct xfs_mount....

> 
> >>+void xfs_rt_fallback(
> >>+    struct xfs_inode    *ip,
> >>+    struct xfs_mount    *mp)
> >
> >Mount first, then inode.
> >
> 
> Just for my own knowledge, is this a convention?

Convention. I also missed that the return type goes on a separate
line. i.e.

void
xfs_rt_fallback(
	struct xfs_mount	*mp,
	struct xfs_inode	*ip)

It's different to the standard linux convention for a couple of
reasons. Firstly, it's the historic format inherited from Irix but
we kept it because it makes using grep to find a function
declaration really easy (i.e. "^xfs_rt_fallback" finds the function
declaration instead of all the callers) and there are functions with
lots of parameters and the "run them all together in as few lines as
possible" is hard on the eyes and requires reformatting of the
entire function definition when one changes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-25 23:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19  3:52 [PATCH v4 0/3] XFS realtime device tweaks Richard Wareing
2017-09-19  3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-09-22  5:35   ` Dave Chinner
2017-09-22 18:26     ` Richard Wareing
2017-09-19  3:52 ` [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-09-22  5:54   ` Dave Chinner
2017-09-22 19:06     ` Richard Wareing
2017-09-19  3:52 ` [PATCH v4 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2017-09-22  7:04   ` Dave Chinner
2017-09-25 18:37     ` Richard Wareing
2017-09-25 23:16       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.