* [PATCH RESEND v6 0/3] XFS realtime device tweaks
@ 2017-11-22 22:40 Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
Re-sending patch; re-based to 4.14-rc8 (& re-tested). Patch 1 in this series
is reviewed and is ready to be merged independent of the others.
====
1. Inode flag now correctly set when locks are held via XFS_BMAPI_RTDATA
flag.
2. Realtime flag is honored when set by user via ioctl or inherit flag on
directory.
3. Misc changes around formatting & bounds checks on sysfs options.
See individual patches for more details.
Please pay close attention to the change in xfs_file_iomap_begin (patch 2),
the new version of the patch by-passes the xfs_file_iomap_begin_delay function
in the "realtime" case, since the realtime code here is not reachable/dead
(see assert in this function). Instead, we by-pass this, hit
xfs_iomap_write_direct where the XFS_BMAPI_RTDATA will be passed on to the
xfs_bmapi_write function where it's set.
I'm curious if there is a better approach, and/or verification this is
sane/safe.
Patch set based off Linux 4.14-rc8 (commit
39dae59d66acd86d1de24294bd2f343fd5e7a625) located @
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git .
Richard Wareing (3):
xfs: Show realtime device stats on statfs calls if inherit flag set
xfs: Set realtime flag based on initial allocation size
xfs: Add realtime fallback if data device full
Documentation/filesystems/xfs.txt | 27 +++++++++++-
fs/xfs/libxfs/xfs_bmap.c | 35 +++++++++++++++
fs/xfs/libxfs/xfs_bmap.h | 3 ++
fs/xfs/xfs_bmap_util.c | 3 ++
fs/xfs/xfs_fsops.c | 2 +
fs/xfs/xfs_inode.c | 6 +++
fs/xfs/xfs_iomap.c | 18 +++++++-
fs/xfs/xfs_linux.h | 2 +
fs/xfs/xfs_mount.c | 24 +++++++++++
fs/xfs/xfs_mount.h | 8 ++++
fs/xfs/xfs_rtalloc.c | 90 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_rtalloc.h | 2 +
fs/xfs/xfs_super.c | 8 ++++
fs/xfs/xfs_sysfs.c | 80 ++++++++++++++++++++++++++++++++++
14 files changed, 305 insertions(+), 3 deletions(-)
--
2.9.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
2017-11-28 21:20 ` Darrick J. Wong
2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2 siblings, 1 reply; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- Reports realtime device free blocks in statfs calls if inheritance
bit is set on the inode of directory. This is a bit more intuitive,
especially for use-cases which are using a much larger device for
the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
option.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* None
Changes since v4:
* None
Changes since v3:
* Fixed accounting bug, we are not required to substract m_alloc_set_aside
as this is a data device only requirement.
* Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
for the inode.
Changes since v2:
* Style updated per Christoph Hellwig's comment
* Fixed bug: statp->f_bavail = statp->f_bfree
fs/xfs/xfs_linux.h | 2 ++
fs/xfs/xfs_super.c | 8 ++++++++
2 files changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index dcd1292..944b02d 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
#define XFS_IS_REALTIME_INODE(ip) \
(((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) && \
(ip)->i_mount->m_rtdev_targp)
+#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
#else
#define XFS_IS_REALTIME_INODE(ip) (0)
+#define XFS_IS_REALTIME_MOUNT(mp) (0)
#endif
#endif /* __XFS_LINUX__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f663022..3c9a989 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1153,6 +1153,14 @@ xfs_fs_statfs(
((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
(XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
xfs_qm_statvfs(ip, statp);
+
+ if (XFS_IS_REALTIME_MOUNT(mp) &&
+ (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
+ statp->f_blocks = sbp->sb_rblocks;
+ statp->f_bavail = statp->f_bfree =
+ sbp->sb_frextents * sbp->sb_rextsize;
+ }
+
return 0;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- The rt_alloc_min sysfs option automatically selects the device (data
device, or realtime) based on the size of the initial allocation of the
file.
- This option can be used to route the storage of small files (and the
inefficient workloads associated with them) to a suitable storage
device such a SSD, while larger allocations are sent to a traditional
HDD.
- Supports writes via O_DIRECT, buffered (i.e. page cache), and
pre-allocations (i.e. fallocate)
- Available only when kernel is compiled w/ CONFIG_XFS_RT option.
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* xfs_inode_select_target renamed to xfs_inode_select_rt_target and returns
boolean to indicate if realtime device target is desired.
* Introduction of XFS_BMAPI_RTDATA which provides signal to the
xfs_bmapi_allocate function the realtime flag must be set on the inode & the
inode logged.
* Manual setting of the realtime flag by ioctl or directory rt inherit flag
now takes precedence over the policy.
* Documentation
Changes since v4:
* Added xfs_inode_select_target function to hold target selection
code
* XFS_IS_REALTIME_MOUNT check now moved inside xfs_inode_select_target
function for better gating
* Improved consistency in the sysfs set behavior
* Style fixes
Changes since v3:
* Now functions via initial allocation regardless of O_DIRECT, buffered or
pre-allocation code paths. Provides a consistent user-experience.
* I Did do some experiments putting this in the xfs_bmapi_write code path
however pre-allocation accounting unfortunately prevents this cleaner
approach. As such, this proved to be the cleanest and functional approach.
* No longer a mount option, now a sysfs tunable
Documentation/filesystems/xfs.txt | 21 +++++++++++++++-
fs/xfs/libxfs/xfs_bmap.c | 35 +++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_bmap.h | 3 +++
fs/xfs/xfs_bmap_util.c | 3 +++
fs/xfs/xfs_inode.c | 6 +++++
fs/xfs/xfs_iomap.c | 18 ++++++++++++--
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_rtalloc.c | 50 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_rtalloc.h | 2 ++
fs/xfs/xfs_sysfs.c | 42 ++++++++++++++++++++++++++++++++
10 files changed, 178 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3b9b5c1..0763972 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -94,7 +94,7 @@ default behaviour.
When inode64 is specified, it indicates that XFS is allowed
to create inodes at any location in the filesystem,
including those which will result in inode numbers occupying
- more than 32 bits of significance.
+ more than 32 bits of significance.
inode32 is provided for backwards compatibility with older
systems and applications, since 64 bits inode numbers might
@@ -467,3 +467,22 @@ the class and error context. For example, the default values for
"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
to "fail immediately" behaviour. This is done because ENODEV is a fatal,
unrecoverable error no matter how many times the metadata IO is retried.
+
+Realtime Device Sysfs Options
+=============================
+
+When using a realtime sub-volume, the following sysfs options are supported:
+
+ /sys/fs/xfs/<dev>/rt_alloc_min
+ (Units: bytes Min: 0 Default: 0 Max: INT_MAX)
+ When set, the file will be allocated blocks from the realtime device if the
+ initial allocation request size (in bytes) is equal to or above this value.
+ For XFS use-cases where appends are unlikely or not supported, this option
+ can be used to place smaller files on a the data device (typically an SSD),
+ while larger files are placed on the realtime device (typically an HDD).
+
+ Any files which have the realtime flag set by an ioctl call or realtime
+ inheritance flag on the directory will not be affected by this option.
+ Buffered, direct IO and pre-allocation are supported.
+
+ Setting the value to "0" disables this behavior.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8926379..dd02a52 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4188,6 +4188,39 @@ xfs_bmapi_reserve_delalloc(
return error;
}
+/*
+ * This function will set the XFS_DIFLAG_REALTIME flag on the inode if
+ * the XFS_BMAPI_RTDATA flag is set on the xfs_bmalloca struct.
+ *
+ * This function is only valid for realtime mounts, and only on the initial
+ * allocation for the file.
+ *
+ */
+void
+xfs_bmapi_rt_data_flag(
+ struct xfs_mount *mp,
+ struct xfs_bmalloca *bma)
+{
+
+ /* Only valid if this is a realtime mount */
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return;
+
+ /* Only valid if file is empty */
+ if (!(bma->datatype & XFS_ALLOC_INITIAL_USER_DATA))
+ return;
+
+ /* Nothing to do, realtime flag already set */
+ if (bma->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+ return;
+
+ /* Set realtime flag and log it if RTDATA flag is set */
+ if (bma->flags & XFS_BMAPI_RTDATA) {
+ bma->ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+ bma->logflags |= XFS_ILOG_CORE;
+ }
+}
+
static int
xfs_bmapi_allocate(
struct xfs_bmalloca *bma)
@@ -4238,6 +4271,8 @@ xfs_bmapi_allocate(
bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
+ xfs_bmapi_rt_data_flag(mp, bma);
+
/*
* Only want to do the alignment at the eof if it is userdata and
* allocation length is larger than a stripe unit.
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 502e0d8..6f67588 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,9 @@ struct xfs_extent_free_item
/* Only convert delalloc space, don't allocate entirely new extents */
#define XFS_BMAPI_DELALLOC 0x400
+/* Allocate to realtime device */
+#define XFS_BMAPI_RTDATA 0x800
+
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6503cfa..b04363b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1053,6 +1053,9 @@ xfs_alloc_file_space(
return -EINVAL;
rt = XFS_IS_REALTIME_INODE(ip);
+ if (!rt && (rt = xfs_inode_select_rt_target(ip, len)))
+ alloc_type |= XFS_BMAPI_RTDATA;
+
extsz = xfs_get_extsz_hint(ip);
count = len;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4ec5b7f..ed29549 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1633,6 +1633,12 @@ xfs_itruncate_extents(
xfs_inode_clear_cowblocks_tag(ip);
}
+ if (ip->i_d.di_nblocks == 0 && XFS_IS_REALTIME_MOUNT(mp) &&
+ mp->m_rt_alloc_min) {
+ /* Clear realtime flag if m_rt_alloc_min policy is in place */
+ ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+ }
+
/*
* Always re-log the inode so that our permanent transaction can keep
* on rolling it forward in the log.
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f179bdf..518a9bb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -40,6 +40,7 @@
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
#include "xfs_reflink.h"
+#include "xfs_rtalloc.h"
#define XFS_WRITEIO_ALIGN(mp,off) (((off) >> mp->m_writeio_log) \
@@ -175,7 +176,11 @@ xfs_iomap_write_direct(
int bmapi_flags = XFS_BMAPI_PREALLOC;
uint tflags = 0;
+
rt = XFS_IS_REALTIME_INODE(ip);
+ if (!rt && (rt = xfs_inode_select_rt_target(ip, count)))
+ bmapi_flags |= XFS_BMAPI_RTDATA;
+
extsz = xfs_get_extsz_hint(ip);
lockmode = XFS_ILOCK_SHARED; /* locked by caller */
@@ -985,8 +990,17 @@ xfs_file_iomap_begin(
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
- /* Reserve delalloc blocks for regular writeback. */
- return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
+ /*
+ * For non-odirect writes, check if this will be allocated to
+ * realtime, if so we by-pass xfs_file_iomap_begin_delay as if
+ * the inode was already marked realtime (see xfs_get_extsz_hint).
+ * The actual setting of the realtime flag on the inode will be
+ * done later on.
+ */
+ if (!xfs_inode_select_rt_target(ip, XFS_FSB_TO_B(mp, length)))
+ /* Reserve delalloc blocks for regular writeback. */
+ return xfs_file_iomap_begin_delay(inode, offset, length,
+ iomap);
}
if (need_excl_ilock(ip, flags)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d0..0db9731 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,6 +197,7 @@ typedef struct xfs_mount {
uint32_t m_generation;
bool m_fail_unmount;
+ xfs_off_t m_rt_alloc_min; /* Min RT allocation */
#ifdef DEBUG
/*
* Frequency with which errors are injected. Replaces xfs_etest; the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 488719d..145007b 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,3 +1284,53 @@ xfs_rtpick_extent(
*pick = b;
return 0;
}
+
+/*
+ * If allocation length is less than rt_alloc_min threshold select the
+ * data device. Otherwise, select the realtime device.
+ */
+bool
+xfs_rt_alloc_min(
+ struct xfs_mount *mp,
+ xfs_off_t len)
+{
+ if (!mp->m_rt_alloc_min)
+ return false;
+
+ if (len >= mp->m_rt_alloc_min)
+ return true;
+
+ return false;
+}
+
+/*
+* Select the target device for the inode based on either the size of the
+* initial allocation, or the amount of space available on the data device.
+*
+*/
+bool
+xfs_inode_select_rt_target(
+ struct xfs_inode *ip,
+ xfs_off_t len)
+{
+ struct xfs_mount *mp = ip->i_mount;
+
+ /* If the mount does not have a realtime device configured, there's
+ * nothing to do here.
+ */
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return false;
+
+ /* You cannot select a new device target once blocks have been allocated
+ * (e.g. fallocate() beyond EOF), or if data has been written already.
+ */
+ if (ip->i_d.di_nextents)
+ return false;
+ if (ip->i_d.di_size)
+ return false;
+
+ /* Select realtime device as our target based on the value of
+ * mp->m_rt_alloc_min. Target selection code if not valid if not set.
+ */
+ return xfs_rt_alloc_min(mp, len);
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 79defa7..4f058b5 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -138,6 +138,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
int xfs_rtalloc_query_all(struct xfs_trans *tp,
xfs_rtalloc_query_range_fn fn,
void *priv);
+bool xfs_inode_select_rt_target(struct xfs_inode *ip, xfs_off_t len);
#else
# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS)
# define xfs_rtfree_extent(t,b,l) (ENOSYS)
@@ -158,6 +159,7 @@ xfs_rtmount_init(
}
# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
# define xfs_rtunmount_inodes(m)
+# define xfs_inode_select_rt_target(i,l) (0)
#endif /* CONFIG_XFS_RT */
#endif /* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b2ccc2..8b425be 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -90,7 +90,49 @@ to_mp(struct kobject *kobject)
return container_of(kobj, struct xfs_mount, m_kobj);
}
+#ifdef CONFIG_XFS_RT
+STATIC ssize_t
+rt_alloc_min_store(
+ struct kobject *kobject,
+ const char *buf,
+ size_t count)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+ int ret;
+ int val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ /* Only valid if using a real-time device */
+ if(!XFS_IS_REALTIME_MOUNT(mp))
+ return -EINVAL;
+
+ if (val >= 0)
+ mp->m_rt_alloc_min = val;
+ else
+ return -EINVAL;
+
+ return count;
+}
+
+STATIC ssize_t
+rt_alloc_min_show(
+ struct kobject *kobject,
+ char *buf)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+
+ return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
+}
+XFS_SYSFS_ATTR_RW(rt_alloc_min);
+#endif
+
static struct attribute *xfs_mp_attrs[] = {
+#ifdef CONFIG_XFS_RT
+ ATTR_LIST(rt_alloc_min),
+#endif
NULL,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- For FSes which have a realtime device configured, rt_fallback_pct forces
allocations to the realtime device after data device usage reaches
rt_fallback_pct.
- Useful for realtime device users to help prevent ENOSPC errors when
selectively storing some files (e.g. small files) on data device, while
others are stored on realtime block device.
- Set via the "rt_fallback_pct" sysfs value which is available if
the kernel is compiled with CONFIG_XFS_RT.
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* Minor change to work with XFS_BMAPI_RTDATA method described
in rt_alloc_min patch
* Fixed bounds checks on sysfs option
* Documentation
Changes since v4:
* Refactored to align with xfs_inode_select_target change
* Fallback percentage reworked to trigger on % space used on data device.
I find this a bit more intuitive as it aligns well with "df" output.
* mp->m_rt_min_fdblocks now assigned via function call
* Better consistency on sysfs options
Changes since v3:
* None, new patch to patch set
Documentation/filesystems/xfs.txt | 6 ++++++
fs/xfs/xfs_fsops.c | 2 ++
fs/xfs/xfs_mount.c | 24 ++++++++++++++++++++++
fs/xfs/xfs_mount.h | 7 +++++++
fs/xfs/xfs_rtalloc.c | 42 ++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_sysfs.c | 38 +++++++++++++++++++++++++++++++++++
6 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 0763972..ed6f6e2 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -486,3 +486,9 @@ When using a realtime sub-volume, the following sysfs options are supported:
Buffered, direct IO and pre-allocation are supported.
Setting the value to "0" disables this behavior.
+
+ /sys/fs/xfs/<dev>/rt_fallback_pct
+ (Units: percentage Min: 0 Default: 0, Max: 100)
+ When set, the file will be allocated blocks from the realtime device if the
+ data device space utilization rises above rt_fallback_pct. Setting the
+ value to "0" disables this behavior.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc5..89713f1 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -610,6 +610,8 @@ xfs_growfs_data_private(
xfs_set_low_space_thresholds(mp);
mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+ mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+
/*
* If we expanded the last AG, free the per-AG reservation
* so we can reinitialize it with the new size.
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e9727d0..3905e57 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1396,3 +1396,27 @@ xfs_dev_is_read_only(
}
return 0;
}
+
+/*
+ * precalculate minimum of data blocks required, if we fall
+ * below this value, we will fallback to the real-time device.
+ *
+ * m_rt_fallback_pct can only be non-zero if a real-time device
+ * is configured.
+ */
+uint64_t
+xfs_rt_calc_min_free_dblocks(
+ struct xfs_mount *mp)
+{
+ xfs_rfsblock_t min_free_dblocks = 0;
+
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return 0;
+
+ /* Pre-compute minimum data blocks required before
+ * falling back to RT device for allocations
+ */
+ min_free_dblocks = mp->m_sb.sb_dblocks * (100 - mp->m_rt_fallback_pct);
+ do_div(min_free_dblocks, 100);
+ return min_free_dblocks;
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0db9731..9dc17b8 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -198,6 +198,12 @@ typedef struct xfs_mount {
bool m_fail_unmount;
xfs_off_t m_rt_alloc_min; /* Min RT allocation */
+ /* Fallback to realtime device if data device usage above rt_fallback_pct */
+ uint m_rt_fallback_pct;
+ /* Use realtime device if free data device blocks falls below this; computed
+ * from m_rt_fallback_pct.
+ */
+ xfs_rfsblock_t m_rt_min_free_dblocks;
#ifdef DEBUG
/*
* Frequency with which errors are injected. Replaces xfs_etest; the
@@ -447,4 +453,5 @@ int xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
int error_class, int error);
+uint64_t xfs_rt_calc_min_free_dblocks(struct xfs_mount *mp);
#endif /* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 145007b..3abd403 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1304,6 +1304,37 @@ xfs_rt_alloc_min(
}
/*
+ * m_rt_min_free_dblocks is a pre-computed threshold, which controls target
+ * selection based on how many free blocks are available on the data device.
+ *
+ * If the number of free data device blocks falls below
+ * mp->m_rt_min_free_dblocks, the realtime device is selected as the target
+ * device. If this value is not set, this target policy is in-active.
+ *
+ */
+bool
+xfs_rt_min_free_dblocks(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ xfs_off_t len)
+{
+ /* Disabled */
+ if (!mp->m_rt_fallback_pct)
+ return false;
+
+ /* If inode target is already realtime device, nothing to do here */
+ if (!XFS_IS_REALTIME_INODE(ip)) {
+ uint64_t free_dblocks;
+ free_dblocks = percpu_counter_sum(&mp->m_fdblocks) -
+ mp->m_alloc_set_aside;
+ if (free_dblocks < mp->m_rt_min_free_dblocks) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/*
* Select the target device for the inode based on either the size of the
* initial allocation, or the amount of space available on the data device.
*
@@ -1332,5 +1363,14 @@ xfs_inode_select_rt_target(
/* Select realtime device as our target based on the value of
* mp->m_rt_alloc_min. Target selection code if not valid if not set.
*/
- return xfs_rt_alloc_min(mp, len);
+ if (xfs_rt_alloc_min(mp, len))
+ return true;
+
+ /* Check if data device has enough space, if not fallback to realtime
+ * device. Valid only if mp->m_rt_fallback_pct is set.
+ */
+ if (xfs_rt_min_free_dblocks(mp, ip, len))
+ return true;
+
+ return false;
}
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b425be..64f29b6 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -127,11 +127,49 @@ rt_alloc_min_show(
return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
}
XFS_SYSFS_ATTR_RW(rt_alloc_min);
+
+STATIC ssize_t
+rt_fallback_pct_store(
+ struct kobject *kobject,
+ const char *buf,
+ size_t count)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+ int ret;
+ int val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return -EINVAL;
+
+ if (val < 0 || val > 100)
+ return -EINVAL;
+
+ /* Only valid if using a real-time device */
+ mp->m_rt_fallback_pct = val;
+ mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+ return count;
+}
+
+STATIC ssize_t
+rt_fallback_pct_show(
+ struct kobject *kobject,
+ char *buf)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_fallback_pct);
+}
+XFS_SYSFS_ATTR_RW(rt_fallback_pct);
#endif
static struct attribute *xfs_mp_attrs[] = {
#ifdef CONFIG_XFS_RT
ATTR_LIST(rt_alloc_min),
+ ATTR_LIST(rt_fallback_pct),
#endif
NULL,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-28 21:20 ` Darrick J. Wong
0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-28 21:20 UTC (permalink / raw)
To: Richard Wareing; +Cc: linux-xfs, david, hch
On Wed, Nov 22, 2017 at 02:40:07PM -0800, Richard Wareing wrote:
> - Reports realtime device free blocks in statfs calls if inheritance
> bit is set on the inode of directory. This is a bit more intuitive,
> especially for use-cases which are using a much larger device for
> the realtime device.
> - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
> realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
> option.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v5:
> * None
>
> Changes since v4:
> * None
>
> Changes since v3:
> * Fixed accounting bug, we are not required to substract m_alloc_set_aside
> as this is a data device only requirement.
> * Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
> now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
> for the inode.
>
> Changes since v2:
> * Style updated per Christoph Hellwig's comment
> * Fixed bug: statp->f_bavail = statp->f_bfree
>
>
> fs/xfs/xfs_linux.h | 2 ++
> fs/xfs/xfs_super.c | 8 ++++++++
> 2 files changed, 10 insertions(+)
>
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index dcd1292..944b02d 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
> #define XFS_IS_REALTIME_INODE(ip) \
> (((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) && \
> (ip)->i_mount->m_rtdev_targp)
> +#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
> #else
> #define XFS_IS_REALTIME_INODE(ip) (0)
> +#define XFS_IS_REALTIME_MOUNT(mp) (0)
> #endif
>
> #endif /* __XFS_LINUX__ */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f663022..3c9a989 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1153,6 +1153,14 @@ xfs_fs_statfs(
> ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
> (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
> xfs_qm_statvfs(ip, statp);
> +
> + if (XFS_IS_REALTIME_MOUNT(mp) &&
> + (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
For everyone else following at home: I asked on IRC, shouldn't we report
rtdev stats for any file that has REALTIME, but not RTINHERIT, set?
--D
> + statp->f_blocks = sbp->sb_rblocks;
> + statp->f_bavail = statp->f_bfree =
> + sbp->sb_frextents * sbp->sb_rextsize;
> + }
> +
> return 0;
> }
>
> --
> 2.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-28 21:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-11-28 21:20 ` Darrick J. Wong
2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.